Google 



This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project 

to make the world's books discoverable online. 

It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject 

to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books 

are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover. 

Marks, notations and other maiginalia present in the original volume will appear in this file - a reminder of this book's long journey from the 

publisher to a library and finally to you. 

Usage guidelines 

Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the 
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing tliis resource, we liave taken steps to 
prevent abuse by commercial parties, including placing technical restrictions on automated querying. 
We also ask that you: 

+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for 
personal, non-commercial purposes. 

+ Refrain fivm automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine 
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the 
use of public domain materials for these purposes and may be able to help. 

+ Maintain attributionTht GoogXt "watermark" you see on each file is essential for in forming people about this project and helping them find 
additional materials through Google Book Search. Please do not remove it. 

+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just 
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other 
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of 
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner 
anywhere in the world. Copyright infringement liabili^ can be quite severe. 

About Google Book Search 

Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers 
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web 

at |http: //books .google .com/I 






"'■' s^^-'N 






/n March^ 1903, <A^ Worshipful Company of Drapers announced their intention 
of granting £1,000 to the Univei^sity of London to he devoted to the furtherance of 
research and higher work at University College. After consultation between the 
University and College authorities^ the Drapers Company presented £1 ,000 to the 
University to assist the statistical work and higher teaching of the Department of 
Applied Mathematics. It seemed desirable to commemorate this—probably^ first 
occasion on which a great City Company has directly endowed higher research work 
in mathematical science — by the issue of a special series of memoirs in the 
preparation of which the Department has been largely assisted by the grant. Such 



is the aim of the present series of ^^ Drapers Company Research Memoirs'' 



K.P. 



Mathenuitical Contiihutions to the Theory of Evolution. — XI V, On the General 

Theory of Skew Correlation and Non-linear Regression. 

By Karl Pearson, F.R.S. 



(1. 

(2. 
(3. 

(4. 

(5. 
(6. 

(7. 
(8. 

(9. 
(10. 

(11. 

(12. 
(13. 

(14. 



Contents. 

Page 
Introductory. General conceptions as to skew variation and correlation. General 

theory of skew variation within the limits of practical errors of sampling. ... 3 
Qeneralised idea of correlation. The correlation ratio -q and its relation to the 

correlation coefficient r 9 

Probable errors of the correlation ratio and other constants of the arrays. Probable 

error of r 11 

On the higher types of regression. Homoscedastic and heteroscedastic syst-ems. 

Homoclitic and heteroclitic systems 21 

Cubical regression. Qeneral equations for regression of any order 23 

Parabolic regression 28 

Linear regression 30 

Illustration A. — On the skew correlation between number of branches to the whorl 

and position of the whorl on the spray in the case of Aspenda odorcUa 31 

Illustration B. — On the skew correlation between age and head height in girls. ... 34 
Illustration C. — On the skew correlation between size of cell and size of body in 

Daphnia magna 38 

Illustration D. — On the skew correlation between number of branches to the whorl 

and position of the whorl on the stem in E^^uisetvm arvense 42 

Quartic regression. Necessary criteria for various types of regression 47 

Illustration E. — Calculation of quartic regression in the case of Equisetvm arvense . . 49 
General conclusions. Nomenclature, clitic and scedastic curves. Difference between 

mere curve fitting and regression calculations. Remarks on retention of decimals . 51 



(1.) Introductory. 

In a series of memoirs presented to the Royal Society I have endeavoured to show 
that the Gaussian-Laplace normal distribution is very far from being a general law of 
frequency distribution either for errors of observation* or for the distribution of 
deviations from type such as occur in organic populations, t It is quite true that the 



« ({ 



On Errors <rf Judgment, &c.," * Phil. Trans./ A, vol 198, pp. 236-299. 
t "On Skew Variation, &c.," 'Phil. Trans.,' A, vol. 186, pp. 343-414. 

A 2 



4 PROFESSOR K. PEARSON ON THE GENERAL THEORY OP ' 

normal distribution applies within certain fields with a remarkable degree of accuracy, 
notably in a whole series of anthropometric, particularly craniometric, observations.* 
In other fields it is not even approximately correct, for example in the distribution of 
barometric variations, t of grades of fertility and incidence of disease4 For such 
cases I have introduced a series of skew frequency curves which serve the purpose of 
describing the frequency of innumerable skew distributions well within the errors of 
random sampling. An exact test for "goodness of fit" in the case of frequency 
distributions has also been now provided. § 

In dealing with frequency which diverges more or less conspicuously fix)m the 
normal law we require to bear in mind at least three important points : — 

(i.) Any expression for frequency must be a graduation formula. It is not a 
disadvantage, but a fundamental requisite that it should smooth off " Scheingipfeln," 
so far as these are irregularities within the limits of random sampling. 

Hence formulae like those provided by Thielb|| and Wundt's pupils,ir which depend 
upon taking enough " moments " to reproduce the complete frequency, are d prion 
fallacious. Many interpolation formulae would do this completely, but such inter- 
polation formulae are not graduation formulae. 

(ii.) The graduation formula must not depend upon the calculation of constants 
having such a high probable error that their value. is practically worthlesa 

Now, the probable error of high moments and products increases rapidly with their 
dimensions ; hence there is, beyond the labour of arithmetic, a practical limit to the 
number of moments or products which can be effectively used in a graduation 
formula. 

(iii.) There must be a systematic method of approaching frequency distributions, 
which can be applied to all cases with reasonably practical ease. 

Now the immense majority, if not the totality, of frequency distributions in homo- 
geneous material show, when the frequency is indefinitely increased, a tendency to 
give a smooth curve characterised by the following properties : — 

(i.) The frequency starts from zero, increases slowly or rapidly to a maximum, and 
then fiiUs again to zero — probably at a quite different rate — as the character for which 
the frequency is measured is steadily increased. This is the almost universal 
unimodal distribution of the frequency of homogeneous series. Homogeneity may 

♦ * Kometrika,' voL I., p. 443 ; vol. II., p. 344 ; voL HI., p. 230. 

t * Phil. Trans.,' A, vol. 190, pp. 423-469. 

t • PhiL Trans.,' A, vol. 192, pp. 257-330 ; * The Chances of Death,' vol. I., pp. 69, et seq. ; * Biometrika,' 
vol. I., p. 134 and p. 292; and for disease, 'Phil. Trans.,' A, vol. 186, pp. 390 and 407; A, vol. 197, 
p. 159. 

§ 'Phil. Mag.,' vol. 50, 1900, pp. 157-174, and * Biometrika,' vol. I., pp. 154-163. 

II * Forelaesninger over Almindelig lagttagelslaere,' Kjobenharn, 1889; 'Theory of Observations,' 
London, 1903. 

H WuNDT, * Philosophische Studien.' A whole series of papers, by 6. F. Lipps and others, seems to me 
to quite miss the point of (i) and (ii.) above. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 5 

for practical purposes be taken to imply unimodality, although the converse is very 
far from true. 

(ii.) In the next place there is generaUy contact of the frequency curve at the 
extremities of the range. These characteristics at once suggest the following form of 
frequency curve, if ySx measure the frequency falling between x and ar+So; : — 

dy/a. = y-i^^> 0.). 

For in this case we have one mode only of the frequency, i.e., at x='—ay and 
dyjdx will vanish when y=0. 

But the assiunption of this form, as long as F (aj) is general, is itself extremely 
general, and it includes cases in which dyjdx may not be zero, but take any values 
from to oc , when iy=0.* 

Now let us assume that F(a;) can be expanded by Maclaurin's theorem, and 
equals 60+^1^+^2^^+ ^3^+ .... Then our differential equation to the frequency 
wiU be 

1 c^ _ x+a 



y dx 60+^1^+ ^2^^+ ^3^+ • • • 



(ii.). 



There is now absolutely no difficulty in determining the unknown constants in 
terms of the moments of the system. Multiply up and also by af, and then integrate 
throughout the range of frequency, we have 

\af (60+^1^ +^2^+ ^3^+ - ' ')-f^ dx= \y (a:+a) x^dx . . . (iii). 

Or, noting that y=0, at the ends of the range we have, with the usual notation for a 
total frequency N, i.e., 

^H'\ = ^ya^dx (iv.), 

the result by integration by parts 

wtyn-i + (^+1) &y» + (w+2) b^\^i + (n+3) b^'^^2 + . . . = -/n+i-V« (v-)- 



Hence, if we write n=0, 1, 2, 3 ... s successively, we have s+ 1 equations to find 
«> ^0, 61, 63 . . . 6,_i in terms of the moments. For example, if we stop at h^ we 
require two moments, at &x three moments, at bg four moments, at 63 six moments, at 
b^ eight moments, and at 6,.i, s>2, 2^—2 moments. 

* For example, cases in which there is a minimum frequency or antimode atx^-a, and dy/dx infinite at 
one or two values for which y=0, as in the frequency distributions discussed in ' PhiL Trans.,' A, voL 186, 
pp. 364-5, and ' Boy. Soc. Proc.,' vol. 62, p. 287, << Cloudiness, a Novel Case of Frequency." 



6 PB0FES8OB K. PEABSON ON THE GENERAL THEORY OF 

There is no difficulty whatever in finding the Vs ; we have the fiystem of equations ; 
where ft'Q=l 



/t'oa+Ox6o+ /o&i+Vi&»+3m'A+V8&4+ • • 


. =-/i 


^i\a+ /0&0+V1&1+3/A+V8&S+VA+ • 


• = ~/a 


|l'ia+2,l\bo+^|l'ih+^H''A-\■5fl\hs-\-6|/^b^+ . . 


• =-/» 


/3a+3/s6o+V8&i+5/A+6/A+VA+ • • 


. =-/, 


/^a+V36o+5/t',6i+6/66j+7/e68+8/76,+ . 


. • =-/» 



(vi.). 

Hence, a, b^, by, &2, 63, . . . are at ouce given in terms of the determinant A and 
its minors, where : 

A = 



f*'o, 


0, 


/o. 


2/1. 


3/s. 


Vs. . . • 


/l, 


/o. 


2/i. 


3f*'2, 


Vs. 


5/*, . . . 


/*'». 


2/1, 


8/2. 


Vs. 


Vv 


6/6. . . • 


/,. 


3m'8, 


Vs. 


v*. 


Vs. 


Vs. . . • 


/*. 


Vs. 


V4. 


6/6. 


Vs. 


8/7. . . . 





. . . (vii.). 

The results may be simplified slightly by taking the origin at the mean, and the 
moments about the mean, indicating this by dropping the dashes and putting /x'j=0. 

Thus we have the following series of fi^equency curves, the origin being the 
mean : — 



(i.) Keeping b^ only 



^rfx=-^/'^ 



(viii.). 



This is the Laplace-Gktussian normal form, 
(ii.) Keeping b^, 6, only 

This is the Type III. curve of my memoir on skew variation.* 
(iii.) Keeping 60, bi, ftj only 

J. I Ms(M 4+3ftg») 
1 d^_ _ ^^ 10jHg/A4— 18/tg*— 12/A 3° 



(ix.). 



6^3 



(X.). 



10^4-18/48'-12^« ' 10/tj^^-18fi8»-12/t,*"' ' 10/t8^^-18^a»-l#^«^ 



• ' Phil Trans.,' A, vol 186, p. 373. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 7 

This equation gave Types I.-VI. of my two memoirs on skew variation,* and 
provides at once the expressions 

d = distance from mode to mean = J^X> igo L\ ' • • • (^•)> 

skewness = -^l^i^, (xii.), 

where a = N/fts, fii = f^sV/^** Pi = y^Jy^y given in my memoir on the theory of errors 
of observation without proof, t 

There is no theoretical limit, however, to this process; we can from (vi.) and (vii) 
express the a and V& at once in terms of determinants, and expanding obtain forms 
which, like the formulss of Thiele, wiU fit closer and closer to the observed 
distribution of frequency, the more moments we take. But there are three ftmdamental 
practical objections to this. These are the following : — 

(a.) Experience shows that the form (x.) suffices for certainly the great bulk of 
frequency distributions, i.e., it describes them effectively within the limits of random 
sampling. 

If the distribution be even approximately normal, the series in the denominator 
converges very rapidly, for the coefficients of every power of x vanish for moments 
obeying the relationships : — 

f^+l = 0, /lAa, = (25— l)/A2/il2,^2, 

which hold for a normal series. 

(6.) The labour of arithmetic and of analysis becomes very great, if we desire to 
keep higher moments. If we go to 64 we should have to calculate the first eight 
moments of the observations about their centroid— a by no means easy task. Further, 
the classification of the resulting curves and the criteria for the right one to use in a 
special case, although not absolutely prohibitive, if we only go as far as 63, are for 
practical purposes idle in the case of taking into account 64. 

(c.) The probable errors of the higher moments are so large that the values found 
for /X7, ftg, Ac, are quite untrustworthy, and even that for /xg is doubtful, J unless we 
have frequency series far larger than usually occur in actual observations. This is a 
strong argument against the utility of any descriptions of frequency, such as those 
suggested by Thiele or Lipps, which depend upon moments higher than the fifth 
or sixth. 

♦ *PhiL Trana.,' A, vol 186, pp. 343-414, and * Phil. Trans.,' A, vol. 197, pp 443-459. 

t • PhiL Trans.,' A, voL 198, p. 277. 

} In 'Phil. Trans.,' A, vol 185, pp. 71-110, I have given a method of breaking up a frequency 
distribution into two normal series. I obtained long ago the criterion for determining whether such a 
resolution is possible or not. But it involves moments hi^ber than the fifth, and the probable error of the 
criterion is thus so great that for practical purposes it is worthless. 



8 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 



The question of the probable deviations of the higher moments can be illustrated as 
follows, by finding the standard deviation of the moment when we take a nimiber of 
random samples from a general population. Let S^ be the standard deviation of /x„ 
then 100 S^ft, is the percentage variability of /x, due to random sampling. The table 
below shows the increase of these percentages in the case of the moments of normal 
distributions, which, quite as well as any other, will illustrate the rapid increase in 
probable error as we use higher and higher moments. The general values of the 
standard deviations of some of the moments were first given by Czubbr,* then 
far more completely by SHEPPARD,t and a rSsumS of all the results recently in 
' Biometrika.'J 

Percbntagb Variability in Moments due to Bandom Sampling when the Series 

is supposed to be Normal. 



Moment. 


500 in serieB. 


1000 in series. 


f*8 


6-3 
U-6 
301 
60-6 


4-6 
10-3 
21-3 
42*9 



Precisely the same rapid increase takes place when we find the variabilities of the 
ratios ft4/ft2^ iJ^Jy^^ /^8//*2^ ^-y which are the forms in which the moments actually 
occur in our coefficients. In this case we have to remember that errors in the 
moments are correlated, but the correlations are given in the papers cited above.§ I 
find in this case the following, series, which is almost as suggestive as the previous 
table. 



Percentage Variabilities in Ratio of Moments due to Random Sampling, the 

Series being Normal. 



Batio. 


500 in series. 


1000 in series. 




7-3 
23-3 
651 


5-2 
16-5 
390 



The order of this increase of percentage variability, and therefore of probable error, 
is the same for skew as for normal variation, and it seems therefore, with the length 



* 'Theorie der Beobachtungsfehler,' S. 130, 0^ seq, 
t 'Phil. Trans.,' A, voL 192, pp. 122, et seq. 
X VoL n., pp. 273-281. 
S Ibid,, p, 277. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 9 

of the series in customary use, idle to use the 7**" or 8**" moments ; these have 
variabilities varying from 30 to 60 per cent, of their values, and accordingly we might 
easily on a random sample reach a 7*^ or 8*** moment having half, or double the value 
it actuaUy has in the general population. Constants based on these high moments 
will be practically idle. They may enable us to describe closely an individual random 
sample, but no safe argument can be drawn from this individual sample as to the 
general population at large, at any rate so far as the argimient is based on the constants 
depending upon these high momenta 

It seems to me accordingly obvious that, bearing in mind the object of a theory of 
frequency {i.e.y the description of the distribution in the general population by aid of 
a graduated sample, agreeing with the general population within the probable errors 
of random sampling), we can dismiss from practical use all theories which call upon 
us to use moments as high as the seventh or eighth. Any use of the general form 
(ii.) beyond 63, indirectly or directly, involves such higher moments. Personally I am 
inclined to doubt whether the continental series using higher moments are, from the 
standpoint of graduation, nearly as good as my form (ii.). 

Hence we seem driven to the skew curves embraced in (x.) as a practical frequency 
series. If we have a frequency not described by (x.) we may, perhaps, use /xg and /x^,* 
but it is difficult to see how its description can possibly be bettered by the use of 
still higher moments. This may seem a counsel of despair ; but it is very far from 
being so in reality when we remember that (x.) has proved its efficiency now — I might 
almost say, without exception — in a wide range of economic, physical, biometric, and 
actuarial data. 

In this memoir on skew correlation I shall accordingly confine my attention, for the 
most part, to constants the discovery of which does not involve the use of moments 
or products of higher than six dimensions, judging all above this limit to be, as a rule, 
disqualified for practical service by the magnitude of their probable errors. 

(2.) Generalised Idea of Correlation. 

Given any two variables or characters A and B, we say that they are correlated 
when, with different values x of A, we do not find the same value y of B equally likely 
to be associated. In other words, certain values of B are relatively more likely to 
occur with the value x than others. The distribution of B's associated with a given 
value X of A is termed an aj-array of B's. If N pairs of A and B are taken, and n^ of 
these have the character A = a:, these n^r form the x-array of B's. This array, like any 
other frequency distribution, will have its mean, which we will denote by y^y and its 

* Referring to equation (ii.), I propose to call curves which stop at hq skew curves of the ^ order. 
Thus the normal curve is a skew curve of zero order; curve of Type III. is a skew curve of the !■* order; 
Types L, II., V., and VI. are of the 2"'' order. I hope shortly to publish a discussion of skew curves of the 
3"* order to complete the practically legitimate range of such curves. 

B 



10 PROFESSOE K. PEARSON ON THE GENERAL THEORY OF 

standard deviation, which we will denote hy Cn,- The mean of all the B characters 
shall be y and their variability given by the standard deviation Oy. Similarly x^ as 
will denote the mean and standard deviation of the A's, and n^, Xy, and cr«^ the 
nimiber of individuals, the mean and the standard deviation for a y-array of A's. 

Now clearly a knowledge of y^ and cr„, will not fix the B's which will be found 
associated with a given A, but it will define the limits of probable or even possible 
B's. The cxu^e obtained by plotting y^to x\& termed the regression curve of y on ax 
A curve in which the ratio of <r„, to the standard deviation Oy is plotted to x may be 
termed a scedastic* curve. Since the standard deviation is always a positive 
quantity, this curve always Ues on one side of the axis ; it is a horizontal line in the 
case of normal correlation — i.e.^ the Gauss-Laplacian distribution of deviations — and 
coincides with the axis, in any case where correlation passes into causation, i.e., when 
one value of B only is associated with each A. 

The mean ordinate of this curve would clearly be a sort of general measure of the 
degree of correlation between A and B, but it seems for many reasons better to base 
our measure on the mean square of the weighted standard deviations of the arrays, or 

<r^* = S(n^J)/N (xiii.). 

cTo^ will thus measure the average variability in B to be found associated with any A, 
its vanishing wOl mean that the scedastic curve as defined above will coincide with 
the axis. Now let a new quantity i/, defined by 

<ra,* = (l -'>?*) <^/ (xiv.)> 

be introduced. Then clearly 7; must Ke between i 1 , because a-^^ cannot be negative, 

being the sum of a number of positive squares. I term rf the cmTelation ratio^ to 

distinguish it fi'om the correlation coefficient represented by r. When t/=±1 the 

correlation is perfect or we have causation. Further we have by a well-known 

property of moments, if 

<r..« = SK(y,-y)«}/N (xv.). 



or 



rf = o-^lo-y (xvi.). 



This shows us that the correlation ratio is the ratio of the variability of the means 
of the x-arrays to the variability of B's in general. If t/=0, it follows that cr«, is 
zero, or firom (xv.) that every y„,=y, ^.e., there is no association of B's with special 
A's at aU, or congelation is zero. Thus the correlation ratio 17, as defined by either 
(xiv.) or (xvi.), is an excellent measure of the stringency of correlation, always lying 
numerically between the values and 1, which mark absolute independence and 

* /^., a curve which measures the " scatter" in the arrays. 



SKEW OOERELATION AND NON-LINEAR REGRESSION. 11 

complete causation respectively. Further, remembering the definition ot r, the 
coefficient of correlation, i.e., 

= S{n,(x-*)(y^-y)} (xvii.), 

we have, fi*om (xv.) and (xviL), 



Now let 



Y=y+^(a:-:g) (xviii.), 



then (xviii.), as is well known, gives the best fitting straight line to the series of 
points y„, loaded with their respective n^. We can now write 

N{^«-r«)<r,« = S{n,(y.-Y)«} + S{«,(Y-y)(y.-Y)}. 
But, using (xviiL), 



= !^^Nr<r^,-^N<r/] 



=0. 
Thus the last summation vanishes, and we have 

N{i7*-r*)<r,« = S{n,(y^-Y)«} (xix.). 

The right-hand side must always be positive, unless y^= Y, when it is zero. Hence 
we conclude that rf is always greater than r, or the correlation ratio greater than the 
correlation coefficient, except in the special case when the means of the x-arrays of y's 
all fell on a straight line, i.e., we have linear regression, and then the two correlation 
constants are equal 

Thus the expression (i/^— r^) a^ has an important physical meaning ; it is the mean 
square deviation of the regression curve firom the straight line which fits this curve 
most closely.* We have now freed our treatment of correlation fi'om any condition 
as to linearity of the regression, and it remains to consider the probable errors of the 
various quantities dealt with. 

(3.) Probable Errors of Constants of Correlation. 

We shaU first prove a niunber of general propositions relating to the probable 
errors of correlation constants. We first note that if n and n' be the firequencies in 

* The propertdes of the correlation ratio were briefly noted in a footnote to a paper by the author in 
' fioy. Soc. Proc.,' voL 71, pp. 303-4. It has been systematically used in my laboratory for some years 
and determined longside r for many distributions. 

B 2 



12 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

any two sub-groups of a total N, for which no member of n is a member of n\ then 
the standard deviation of n due to random sampling is given by 

2.« = n(l-^) (XX.). 

and the correlation between deviations in n and n' due to random sampling is given 
by 

R,r«/S»2,/ = — ^ (xxL). 



nn' 



Problem I. — To find the correlation in deviations dice to random sampling between 
the number n,^ in the Xp-array ofxfs and the number ny, in the y^array ofx's. 

If the symbol 8n denote the error or deviation in n, we have with an obvious 
subscript notation* 

if there be q groups of y's, and again 

if there be i groups of x*8. 

Multiply the expressions for Sn^^ and Sn^, together and we have 

where the smnmation is for every pair of values of u and v, differing fix^m s and p. 

Summing all such pairs of values for every random sample and dividing by the 
number of samples taken, we have the usual definition of correlation 



2.AR%-.=^'^.(i-^)-s(^^) ; 



or, 

This gives R«. . , the required correlation, since S. and %. are known from (xx.). 

Problem II. — To find the. correlation between deviations in the total n^^ of any array 
and in a/ny stib-group n^^, of this a/rray. 
We have at once 

^,M*a.= (8»^.)*+S {^zaMxa) 

where t< is to be taken every value other than s in the summation term. Summing 
for all random samples and dividing by their number, we have, after using results 
like (xx.) and (xxL), 

which gives R».,«.^.- 

* flay = freqaency of groups with characters x and y. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 13 

Proposition III. — There is no correlation between deviatians in the mean of an 
x-a/rray y^^ and the total nv/mher in that array. 

Hence as before, using (xxiiL), &c., 

=0, 
which proves that B^. «, is zero. 

Proposition TV. — There is no correlation between deviations in the mean of an 
x-array and in the total number in any other a/rray. 

Proof as before. 

Proposition V. — ITiere is no correlation between deviations in the mean of one 
x-array and in the mean of a second x-array. 

We have 

«v%v=S {8n:^yjy,)—yx,'^x,- 

Multiply these two expressions together, sum for all random samples, and divide 
by the number of such samples. We find 

H-y^-S' {n,,.n^j/u)/N 
— S(n^.n^,,^,*)/N 

— S' {n^^,n^',.'y,ywW 

^xJ^xJ I f^xj^x.1 

The last term is ^^^ ^^vy V ^ and thus the right-hand side is identically zero. It 

thus appears that there is no correlation between errors made in finding the means of 
two arrays. This result is not at once obvious, although a very little consideration 
shows it must be true. 



14 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

Proposition VI. — To prove that the standard deviation of the mean y^ of any 



^n.. 



x-array due to random sampling equals ~7=^' 
We have 

Square, sum for all random 4samples, and divide by the number of such samples. 
We have 



+s{",^.(i-^)y.'} 



Hence 



=n^o-^». 



^,.=<rnj\/ns, (xxiv.). 



Thus the probable error of the mean of an array has exactly the same form as the 
probable error of the mean of a random sample of a definite number of individuals. 
L array ^y have a wiable nu„W of individil. but we have eee» in 
Proposition III. that there is no correlation between errors in its mean and errors in 
the total niunber of individuals contained in it. 

Problem VII. — To find the prohahle error of the standard deviation of any array. 

By a precisely similar investigation to that of the previous proposition we find 



K=V^" (-•). 

where 






This is identical with the probable error we should have if the array were a random 
sample of constant size. 

In many cases it will be sufficiently approximate to put m^^^dm^^ and we then 
have 

•67449 2,. =-67449 5^ (xxvL), 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 15 

the well-known form for the probable error of the standard deviation of a normal 
distribution of a definite nmnber of individuals. 

Problem VIII. — To find the standard deviation of the standard-deviation <r^ of the 
means of the aiTays due to random sampling. 

Since 

2N<rM8<rM=S {8n^(y^-y)«}+2S {8y^«x,(y, -y)}-28yS {n^{y,-y)], 
the last term of which vanishes, since 

Square the above relation, sum for all random samples, and divide by the number 
of such samples. 
We find 

4N W V=S [n^ (l -^ ) (y, -# } 

-2S{^'(y^-#(y^,-y)«} 

+4S {2,.,2,. R..,,.^ (yv-# (y«.-y)} 

+4S{%,X*(y^~y)«}. 

But R*,«,, R«,^, , and Ry.»,; vanish by Propositions III., IV., and V. Further, by 
VI., Sy, ^=cr^, V^*F- Hence we have 

-2s{^(y^-y)Myv-y)'} 

+4S{n^<r,;(y^-y)*} 

+4S{n^<r,^/(y^-y)*i. 
Now let 

NX,=S{n,,(y^-y>} 

be the rt"* moment of the means of the arrays about their mean. Then clearly 
X2=<rM*. Further, since S (n^r^o-,,,*) = Na'/(1— i/^), we can write 



( 

I 

1 



16 PROFESSOR K. PEARSON ON THE GENERAL THEORY OP 

where xi ^s a purely numerical constant, which is equal to unity for those cases in 
which there is no correlation between the standard deviation of an array and the 
square of its mean's deviation from the mean. Thus finally we find 

This enables us at once to find the probable error of the standard deviation of the 
means of the arrays. 

Proposition IX. — To find the correlation between the deinations due to random 

sa/m/pling in the values of o-y and ctm. 

We have 

Na-,«=SK(y-#}, 

the last term vanishes hecause S (ny.^«)=N^. 
Thus 

2N<^,8a■,=S{Sn,.(y'-#^ 

But from the previous proposition 

Multiply these two expressions together, sum for all random samples and divide by 
the number of such samples ; we find 

To evaluate this, we require to find the two correlations expressed by R«,fi, and 
B^^.^. We wiU consider the two summation terms separately. 

First Term,. 8n;r,=S^;^,+8w;^,+ • • • +Sn^^,+ • • • 

H.=H^, + H^.+ • • • +H^r+ • • • 

where in the smnmation p' and s^ are not equal to p and s. 
Proceeding in the usual manner we find 



S.,2^.E.,^=n^.(l-'^-) - s{"-^«v,.| 






SKEW CORBEL ATION AND NON-LINEAB REGRESSION. 17 

where in the first sum 5' is to take all possible values, and in the second ^' is to take 
all possible values. Thus we have 

S..^S«,.R^^^=n^-:!^^- (xxviii.). 

Substituting we find 

First Term = S^ {n,^. {y.^-yf {y^-^} 



-S,[^'^-(i,,-yY{y.-yr]. 



Here both the summations are really double summations ; fixing our attention on 
any Xp, t.e., on any array of y's for a given value of a;, we have first to sum for all y's 
in this array, and then we have to siun for all arrays. This is the meaning of S^. In 
S2 we are to associate every array of x's with every array of ys ; hence this term wiU 
break up at once into two factors, i.e., 

=N<r/ X o-M*. 
Keeping Xp constant first in Sj, we see that 

is the 2""* moment of the y's in the Xp array about the mean of the system 

Combining we have 

First Te^-m = S{n,,(y,,-j?)*}+S{n,,<r^/(y,,~y)*}-N<r,VM« 

We now turn to the second term which involves the discovery of R^^^. . 

Hence 

n^8n,.S«/^= —y^ (H^,+ 8n,^+ • • • + H^+ • • .) 8n»> • 

Sum for all random samples and divide by the number of such samples ; we have 

n.n, 






N 

='»«i*.(y#— y*,) (xxx.). 



/ 



18 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

Substituting we have 

Second Term = 2S { w,^. {y,—ys^ {y*—9f {V^—p) ) ■ 

Here again the summation is of a double character. 

Let us first take Xp as constant and sum for every value of y^ We may write 
yt"- y^={yt'-yx,'^yx^—y)y and oiu* first simunation will be 

2 (y^-9) X s [n,^. {(y,-y,,)s+2 (y,-yO* (y* -y)+(y*-y*,) {y^-yf) ] 

if 

The last term vanishes for S(nx^,y«)=n2,yx, by the definition of the mean. 
Hence 

Second Term — 2S {n^^jn^ (y*,— y)} + 4S {n,,<r«^« (y#,— y)*}- 

Here m^ is the third moments of the Xp array of y's, which will probably be very 
small if the arrays are nearly symmetrical and the first term clearly depends on the 
existence of a correlation between the skewness of the arrays and the magnitude 
of their means. 

We may write the first term then : 

= 2N<r,,VM X Xs 
=2No-/(l-Vr<rMXx„ 

where y^ is a purely nmnerical quantity, which for most cases will probably be very 
small or even zero. 
Thus we find : 

Second Term = 2Na'/ (1 -i/»)»/'<rMX2+ 4Ncr/crM * (1 -yf)x\ • • (xxxi.). 

We can now return to p. 16 and write down the fiill correlation between deviations 
in the values of cr^ and ctm due to random sampling. Remembering that crM=>?<ry,* 
we find : • 



* It should be remembered that this definition of 17 gives it invariably the pasUive sign. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 19 

Proposition X. — To find the standard deviation of the values of the correlation 
ratio 7f due to random scmtpling, i.e., to find the probable en*or of the correlation 
ratio Tf. 

We have 

Hence 

— ^^^ _^^_ ^ 

Squaring, summing for all random samples and dividing by the number of such 
samples, we have : 

rf ctm <r/ cTMCTy 

^<r«* is given (xxvii.), 2^,2^^R^^, by (xxxii.) and 2^,*=^^ H~H by a well-known 

formula.* 

Substituting, we have the complete value of 2, given by : 

i7« 4N V Ni7« ^ 4N /tj« 



2NV 2N^ '''Xi^2N Ni; 



or, after re-arranging, 






For normal correlation, ft4=3/x2^. Further 
and 

Hence the second and third terms vanish. Further Xi=l ^^<^ X2^^> while i7=r. 
Hence we have 

which agrees with the special result. 

* < Biometrika,' vol II., p. 376. 

C 2 



20 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

In any other case, X2> Xi""^* (M4"~3ft2*)/ft2*, (^4— SX^*)/^* will probably be small and 
thus 

« 

Probable error of 

i;=-67449(l-i;2)/\/N, nearly (xxxiv.). 

This simple form suffices for many practical cases. 

If greater exactitude is wanted, there is, however, no great labour in using 
(xxxiii.). We find the means and standard deviations of each array. 

Then NX^ and NX4 are the 2**^ and 4*** moments of the means of these arrays 
about their mean. 

N/12 and N/I4 are the 2**^ and 4*** moments about the mean of the y-characters, and 
will always be known for skew variation. 

Xi is defined by 

and can be easily found when the means and standard deviations of each array have 
been found. 

The most troublesome expression is ^2 defined by 

^^ Ncr/(l-i;2)»crM ^xxxvi.;. 

But as we do not take usually more than 10 to 20 arrays, the discovery of their 
S'* moments is not an extremely difficult task. As a rule, however, x^ is very small 
and may be fairly neglected, even when we must find Xi""^* -^ these points will 
be dealt with in the numerical illustrations given later in this paper. At present 
we note that the probable error of if has been determined, and that its value for the 
general case is not really more complex than the value of the probable error of r in 
the general case, which requires the determination of product moments of the 4*** 
order.* 

* Let Np^, = S {upoy (x - iK)9 (y - y)*}, then the probable error of r ia given by 

^~N\ p^^^ + ^2p^p^ "^ 4i>2o2 + 4po2^' VxiP^ PiiP^ j>. (xxxvu.;. 

This agrees with the value given by Sheppard (*Phil. Trans.,' A, vol. 192, p. 128), except that the r^ 
factor has been dropped by a printer's error in his paper. For the special case of a normal distribution, we 
have easily from the equation to the normal surface 

i>4o = 3i?2o^ Pq^^^Pw?, P«i^^P\\Pvi, i>i8 = 3;hii>02, (l>22-3;?ii2)/pii* = (l-r2)/r2 
and 

^^=,._l, whence ^^1-^)1^, 
the well-known form (* Phil. Trans.,' A, vol. 191, p. 245). 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 21 



(4.) On the Higher Types of Regression. 

* We have already seen how the introduction of the correlation ratio tf enables us to 
drop the limitations associated with the Grauss-Laplacian form of frequency, and the 
Bravais correlation formulsB. The fundamental step towards this advance was 
undoubtedly taken by G. U. Yule in his paper in the * Roy. Soc. Proc./ vol. 60, 
pp. 477 et seq.y wherein he shows that if the regression be linear, the Bravais type of 
formula applied to multiple correlation is still true, although we make no assumption 
as to the form of the frequency surface. It would undoubtedly be a gain to have 
skew frequency surfaces which would describe skew correlation for the great mass of 
cases as effectivly as the series of skew frequency curves describe skew variation, but 
although a considerable amount of progress has been made in the consideration of 
these surfaces, their frill theory has not yet been worked out owing to difficulties 
of analysis, and their complete discussion must still be postponed. Yule's method 
of approaching the problem from the form of the regression curves is, however, 
available and capable of very great extension. Its chief advantage is that it 
makes little or no assumption as to the distribution of frequency ; its chief defect 
lies even in this advantage of generality : it does not enable us to predict the 
probability of an individual with a given combination of characters. This follows at 
once from the fact that we make no assumption as to the form of the distribution 
within an array. Without some theory as to variation within the array, we are 
reduced to the laborious process of calculating the standard deviation, skewness, and 
other general characters of each array, a lengthy and troublesome process compared 
with a theory which would, like the Bravais theory, give these at once in terms of a 
few constants determined from the data as a whole. 

■ In the great bulk of biometrical and economical enquiries, however, the legression 
does not diverge very markedly fix>m the linear form. In the cases of non- linear 
regression that I have hitherto had to deal with, I find that parabolse of the 2"* 
or 3** order will suffice as a rule to describe the deviation fix)m linearity. If 
they did not, we could, of course, use ciurves of higher orders, but the difficulty 
referred to in the first section of this paper at once arises : we then need to use 
in the determination moments and product-moments of such high orders that the 
probable errors of the constants are so high as to render valueless their calculation 
from such statistical data as we can hope for in most actual inquiries. In the great 
bulk of investigations it is practically impossible to increase our random samples 
from 500 to 1,000 individuals up to 50,000 to 100,000. Nor in the great 
bulk of statistical cases is any such increase even desirable, for a fairly wide 
experience shows that 2"^ and S'^ order parabolse amply suffice to describe the 
skewness of the regression line. I shall accordingly classify skew correlation in the 
following manner : — 



22 PROFESSOR K. PEARSON ON THE GENERAL THEORY OP 

(a.) Linear Regression : 

The mean of an aj-array of y^s, t.6., y^^, is given by 

y:^=^o+<*i^/» (xxxviii.). 

(6.) Parabolic* Regression: 

The mean of an oj-array of y's, i.e., y,,, is given by 

y*>=«o+«iav+»2ay* (xxxix.). 

(e.) Cubical^ Regression : 

The mean of an x-array of y^s, i.e.^ y,^, is given by 

yz=a^+ayX^+a^j?+a^^^ (xl.). 

It is conceivable — in fact, from unpublished work abeady done, highly probable — 
that the theory of skew variation will give regression curves, not of the exact form 
involved in (xxxix.) or (xl.), but ccmtaining product terms in x and y. The most 
general equation to a regression curve may be taken to be of the type 

and what experience shows us is : that for the great bulk of vital phenomena it is 
sufficient to expand by Maclaubin's theorem and keep the first three or four terms. 
Indeed, in the large majority of cases, (xxxviii.) alone suffices. Hence, if (xxxix.) 
or (xl.) fit the data within the limits of random sampling, we are not injudiciously 
circumscribing fiiture developments of the theory of skew correlation by casting our 
regression curves into the above forma I shall deal first with the theory of cubical 
regression, for we can then obtain from this the conditions necessary for parabolic 
and linear regressiona 

I must remind the reader, however, that the form of the regression Une does not in 
any way limit the nature of the distribution of the array about its mean ; the 
variability of an array, i.e., the standard deviation of an array, having for its mean 

value cTy y/l—ri^f may or may not be the same for all arraya If it is the same, or all 
arrays are equally scattered about their means, I shall speak of the system as a 
homoscedastic system, otherwise it is a heteroscedastic system. The Grauss-Laplacian 
correlation surface gives a homoscedastic linear system. Mr. Yule's linear regression 
is not necessarily homoscedastic; it may, however, be homoscedastic without being 
normal, and then the scatter of each array is measured by <ry\/l— r^. When a 
system is homoscedastic, but not linear, then <r«.^s=cr/(l^i7^), and consequently the 

Xl of (xxxv.) is equal to unity. Xi^^ ^ ^ necessary result of homoscedasticity. 
Lastly, we want a word to express the idea of all the arrays having equal skewness, 

* ' Parabolic ' and ' cubical ' are here used in the narrower sense of regression curves corresponding to 
ordinary parabolas of the 2"^ order and of the 3** order respectively: in both cases the axis of the 
parabola being parallel to the axis of the y-character. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 23 

or being asymmetrical in an equal degree about their means. I shall express this by 

the term homoditic ; generally the arrays will not be equally asymmetrical round their 

means, and in this case we shall speak of them as heteroclitic. If there were no 

skewness in any of the arrays, then m^ of (xxxvi.) would be zero for all of them. 

I term arrays of no skewness isocurtic, and skew arrays allocurtic. If we supposed 

that a curve of Type III. would sufficiently express the skewness of an array, we 

should have 

Sk.=JlW3/c^«,^^ 

and therefore from (xxxvi.) 

_ 2S{n.,<r.,»(Sk.)(y.,-y)} . ,. . 



For a homoscedastic system we have <r,. =a>\/l— 17*, and therefore 

_ 2S{n..(Sk)(y^-y )} 
and for a homoclitic system 

For a homoclitic homoscedastic system, whether isocurtic or allocurtic, 

2(Sk)S{n^(y..-y-)} _ 
^* N^- ^- 

Thus xa is to a certain extent a measure of both homoscedasticity and homoclisy. 
But as the correlation between cr^, and yx,—§ is in most cases extremely small, while 
the skewness of the array can well change its sign with arrays above or below the 
mean, we can fairly consider the smallness of xa to be a measure of the approach to 
homoclisy. I am thus inclined to speak of xi — 1 and Xa as measures of heteroscedasticity 
and heteroclisy. When they both vanish we have a homoscedastic homoclitic system. 
For such systems i;, the correlation ratio, tells us effectively the scatter of any array, 
and as a rule all we want to know, in addition, is the form of the regression line. 

(5.) Cubical Regression. 

We have already used the following notation 

Ni>,^=8{n„(x-x)t(y-^K} (xlii.). 

We shall shorten our formulae if we write 

^=i^ii/(<^^y)» €=i>2i/(<^**<^y)» C=i>ai/(<^*^<^y)> ^=i>4i/(<^**<^y) • (xliii.). 

We have already used {iq to denote p^^ and we shall use v^ for p^^ . Further, we 
write 

P\=H^I^%\ ^8=^>2*> ^-^fyJ^%^ fi^=yJ^2^' • . . (xUv.). 



24 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

v/^Si =1/3/0-,* will be of the same sign as v^. These constants fi have been previously 
used in the theory of skew variation,* 
We shall fiirther put 

€=€-rv/A, Z=C-rl3^. 9^0--rpjVJi .... (xlv.). 

The regularity of the forms c, ^ ^, is rather screened by the above notation, which 
is introduced for brevity ; using the pqq notation, we have 

c=BjBo-i>iLP5Q, i- P%\P^-P\\P¥s ^ ff^P nP^o'-Pn Pso . , (xlvi.), 

a-JCy Cx^a-y 0':c^O'y 

whence the law of formation of these constants is easily seen. 
The regi*ession curve may now be conveniently put into the form 

.y_f^=6„+6,5^+6,(^V63(^'' .... (xlvu.). 
Or, multiplying by »., and summing for aU arrays, 

the sign of \//3^ being always that of the 3"* moment. Hence, measuring from 
the means of the two characters, i.e., X^=a;^— :r, Yx^=yx,— ^, we may re-write (xlvii.) 

¥,>,=/,, (X,/<r,)+6,{(X,/a-,)«-l}+6,{(X,/o-.)»-v/i8;} . . (xlviii.). 

Now multiply by nj^^Xpfa-^ and sum for all arrays, remembering that 

Nrcr^^=8K,XY)=SKX^Y.,), 
we find 

This enables us to get rid of fe^ and write (xlviii.) 

+65{(V<^-)'-i8,(V<^x)-N/A} • • • (^«)- 
Now multiply by n.j;,(X;,/<r,)* and simi for all arraya We have 



or 
where 



«=r^/i8,+6,(/8.-^l-l)+&3(i83/^/A-/82^/i8l-^//8l). 

^=(/8s-i8i/8,-i8i)/v/i8j 



* • Phil. Trans.,' A, voL 1«6, p. 368, and A, vol. 198, p. 278. 



SKEW CORRELATION AND NON-LINEAB REGRESSION. 



25 



Eliminating &2> '^e can write (xlix.) 

Y,ycr,=r (X,/cr,)+i.{(X,/cr.)^- y^(X,/cr,)- 1 } 



+ h^{X,/aJ)^^fi,{X,/<T.)-^./fi^^^^ . (Hi.). 



Now multiply by n^^^ {X^/a-Jf and sum for all arrays ; we find 



or 

where 

It follows from (L) that 






We can thus write the cubic regression curve in either of the forms* 

* The method is perfectly easy of extension, if we choose to use higher products and moments, to a 
regression curve of any order, e.g.y 

N€gi-S(iiayY^,V)/K«^y). and 7# = »'#K' = S(n^V)/(N<r/). 
0= Jo +0x6i + hi + 78^ + . . . + ynhn + 



For let : 
we have: 



<n-Ox6o+ h\ + 78^2 4- 7A + 
<2i» h 4- 7 A + rA + 75^ + 



<i»i= rA +7p+A+rp+2^+rj»+i^+ 



+ 7n+l*n + 
+ 7n+«*n + 



+ 7n+j/>n + 



Hence writing coi for 0, 70* 1, 71 = 0, 72= 1, we have 

*n = (<01 Ao» + «11 Am + €ti A2» + . 

where A = 1 70, 7i> 7«> 78> 

7i» 7«» 78» 74> 

7«» 78. 74. 75» 



7p> 7p+i> 7p+«> 7p+»» 



7n, 
7*+b 
7n+». 



7p+«> 



and A^ is the minor of the constituent in the (^+1)**^ row and (n+1)'^ column. As we have already 
noted, however, solutions involving anything beyond 79 are hardly likely to be of practical value. 

The value above for h^ is the type equation given by the method of least squares, when we strike the 
best fitting curve to all the entries in the correlation table. I have already pointed out that the method 
of moments becomes identical ivith that of least squares, when we fit parabolsB of any order (' Biometrika,' 
voL I., p. 271). The retention oi the method of moments, however, enables us, without abrupt change of 
method, to introduce the needful % and to grasp at once the application of the proper Sheppard's correc- 
tions. The extension of the method of least squares to caniinua in space has not yet, as far as I am aware, 

been fully considered. 

D 



26 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 
Y„/cr,=:r(Vcr.)+|- {(X>,)«-v/;8:(V<^,)-l} 



or 



Y,>,=r (X,/<r,)+ l^f% {( V<^x)»- v^A (X,/<r,)-l } 

The former arrangement of the solution, while it is apparently more cimibersome, 
is, perhaps, the better, for it gives us at once the measure of the deviation from 

parabolic or 2"** order regression, i.e., the approach of f^2""^^s t^ zero. In the case 

of normal correlation both i and C vanish, and neglecting higher terms the condition 

for linear regression is that €=0, and f^s— €^s=0, or, again, i and ^=0. For 
material in which th^ ic-variability is isocurtic, ^i=^3=<^g=0, and the regression 
curve takes the simple form 

Yj<T,=r{X,/<r.)+I-{{X,/a.Y^l] + h{X^^^ . (Ivi.) ten 

We now turn to express these relations in terms of the correlation ratio rj. 
Multiply (Ivi.) by n^^^^Ja-yy and sum for all arrays, we obtain 

whence results 

M-n'-'^)-^'--{U,-HM^A.-^') (ivii.). 

(Ivii. ) is a necessary condition of cubical regression. 

It is of course not a sufficient condition, as we ought to show that 64, ftg* ^^m ^ 
vanish, and thus any number ot conditions may be foimd. For example, multiply by 
w^X^*/<r,* and sum for all arrays, then 

^= j^% (^*-^»-^«)+ J^^« ^6-^8^^ W . . . (Iviii.) 

is also a necessary condition. Here P^^vrjv^la-:}^. But the high as well as complicated 
value of the probable errors of such expressions renders it idle to consider them in 
practice. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 27 

Substituting (Ivii.) in (Ivi) we have : 

Y,>,=r(X,/<r,)-|-^{xA,)»-yA(x,/«r,)-l} 

-^{(X,/<^.)«-^/A(V<^,)-l}] . (lix.). 

Which sign is to be given to the root will often be visible on inspection of the 
observations. Otherwise the sign of the root must be the same as that of 

(lix.) will save the calculation of ^ if the root-sign can be found by inspection. 
Finally there is a third form into which we may put the cubic. Eliminate ^2^4— <^* 
from (lix.) by aid of (Ivii.) and it becomes 

Y,>,=r(V<r,)+'^^;^^^^ {(Xycr,)«-^/i8i(V<r,)-l} 

4^2— €03 

At first sight this might appear to be the best form of the cubic, because it does 
not involve the 6*** moment of the variable x. But this is very far from being the 
case in actual practice. The reason is simply this, c, 4 and if-^r^ are in most cases 
very small — they vanish in normal correlation — relatively to ^^ and ^4. Hence both 
numerators and denominators of the coefl&cients of the square and cubic terms are 
the ratio of small quantities, and accordingly subject to large probable errors. For 
this reason (Ix.) was found in actual practice to be of no service. Of the other two 
forms (Ivii.) and (lix.), which neither suffer from this defect, 4^^^^—^^ being always 
large relative to the numerators, (lix.) while involving a 6*** moment does not 
involve a 4*** product, ^, and experience shows that the former is on the whole 
easier to determine and more exact than the former. Hence (lix.) seems the prefer- 
able form, even if it be needfiil in certain cases to determine X ^ order to fix the 
sign of the radical. The cubic regression curve thus demands a knowledge of the 
correlation ratio r;, of the ** cubic product " c and the sign by inspection or calculation * 

of ^<^2'~^^3- Besides this, we require the first six moments of the independent 

variable x. Of course if the regression of a? on y be required, as well as that of 

y on Xy the second correlation ratio and cubic product as well as the first six moments 

of y must be found. It is rare, however, that both regression cm^es are needed for 

a single enquiry. 

As to the general form of (lix.), we note that there will always be a real point of 

inflexion given by 

X,/«r,=i (6,<^3-«)/(&»<^«) (1^). 

D 2 



28 PROFESSOR K. PEARSON ON THE GENERAL THEORY OP 
where 

and fiirther that there may be two points of horizontality given by a certain quadratic. 
Thus, in general, the regression line will tend to be part of an S-shaped curve. The 
horizontal points may be imaginary, or, if real, either they or the point of inflexion 
may be far beyond the portion of the curve which crosses the observed field of 
frequency. If we consider, however, the slope of the regression curve to measure 
the regression in the neighbourhood of any point, we note that the regression is a 
maximum at the point given by (Ixi.), and grows smaller and smaller towards the two 
points of horizontality, t.e., points of complete local independence of the two 
characters. These are not unfamiliar features in certain practical cases of skew 
correlation,* and accordingly the cubic regression curve provides us with a ready 
means of describing regression phenomena, which cannot be dealt with by the simple 
line or the parabola. 

It may of course be suggested that a quartic or quintic curve would give a 
better result than a cubic. The answer to this is : Possibly, but the high moments 
and products required render it impossible to deal even superficially with the probable 
errors of the constants involved. The calculation of the probable error of r; is a 
sufficiently stiflF task in the general case. To test the probable error of a condition 
like (Ivii.), to say nothing of one like (Iviii.), would involve an immense amount of 
work, since we should want the correlation of errors in r;, ?, ^, and 0. Speaking with 
some experience of practical statistical possibilities, I think, the tendency to use very 
high moments or product-moments must be curtailed to the minimum of actual needs. 
We cannot deny the existence of skew variation, nor of the sensible curvature of 
regression lines. We must admit their existence as the result of statistical experience. 
This existence involves a great widening of the old fi'equency notions and the need 
for a new means of description. But we must remember that statistics are essentially 
a practical gtudy, the art of describing by a few numerical constants observational 
experience, and we must curtail at every turn the desire to run riot in mathematical 
formulae, which cannot be generally applied in actual practice, t Still I propose later 
in this paper to deal with the general formulse for quartic regression. 

(6.) Parabolic Regression. 

For a parabolic system 6g must vanish, or nearly vanish. Hence we have from 
(liii.) and (Ivii.). 

f^i— €<^8=0 (lxii.)» 

<^^(^a-r2)-e2=0 (Ixiii.). 

* Compare for example the regression line of age of mean age of bridegroom for actual age of bride, 
which gives a typical S-shaped curve. See ' Biometrika,' vol. 11., p. 20. 
t These remarks have special reference to the points dealt with on p. 6. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 29 

From these conditions we find 



or 



62 = «/<^=±%/(>7«-»W8. 

These give for the form of the parabolic regression curve 

Y,;<r,=r(V<^,)+-i{(X>,)*-v^^(X,/<r,)-l} . . . (Ixiv.). 

Y,A,=r(Y^/<r,)± V^5^((X>,)^-y^;(X>,)-U • • (Ixv.). 

The latter form, besides the correlation coefficient and correlation ratio, requires only 
a knowledge of the skew variation constants fti and ft^^ and is therefore very easy to 
determine. Except for very nearly linear regression, there can be no doubt as to the 
sign of v/i;^— r^, as we can tell at once whether the parabola ought to be concave or 
convex to the or-axis. In other cases the sign of x/rj^^r^ must be taken to coincide 
with that of c, which must therefore be found. It will then be as easy to use (Ixiv.) 
as (Ixv.), although probably 17 and r can be found with less error than €. 

It is thus quite easy to allow for such curvature of the regression line as can be 
expressed by a parabola of the 2"'' order of the type considered. 

We notice at once that the regression curve does not pass through the mean of the 
two characters. Or, an individual with the mean of one character will most probably 
not have the mean of a second chai'acter. This is a rather important result, which 
follows at once for nearly all types of skew correlation. 

It will be seen, for example, that Quetelbt's " mean man," defended by Professor 
Edgeworth as theoretically justifiable, depends entirely on human characters giving 
linear regression curves. Such linear curves are certainly given by many pairs of 
characters, e.g., cranial and body measurements, but there are certainly other 
characters for which regression ceases to be sensibly linear, and the conception of the 
" mean man " in this case fails. For example, if age be considered as a character, 
then the regression is certainly not linear, and the individual of mean age will not 
necessarily have either the mean physical or psychical characters. This seems of 
some importance for the general conception of " type," if by type we denote the mean, 
for probably there are other characters than age for which regression is skew. 

The regression, i.e., dYj^JdXp will be zero, for a point X^Ymmx.) for which 

^=i{v/3r-V^^} 0^-) 

'the sign of the root being determined as before. Clearly, therefore, unless r be very 
small, or rj^ diverges very sensibly firom r^, this point of zero regression may correspond 



30 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

to a very large abscissa, and in some cases will lie entirely outside the range of 
observable frequency. 

The parabola of regression cuts the line of regression, i.e., the line of best fit to 
the series of regression points, or to the means of the x-arrays, in two points 
determined by the quadratic equation 

or 

f^=i{ v/jSTi v/JM^} (Ixvii.). 

These points are always real, and correspond, if regression be truly parabolic, to 
the same values of the aj-character, whatever be the y-character of which we are 
considering the correlation. In the case of normal variation of the x-character 
only, these are the points of inflexion of the a;-distribution. 




(7.) Linear Regression. 

In this case it is necessary that both 6^ 8»nd 63 vanish within the limits of random 
sampling, and, although these are not theoretically suflScient — for a whole series of 
relations between the higher product- moments could be written down* — they are for 
practical purposes sufficient. 

Hence we have the following conditions for linear regression : — 

yf=r^ (Ixviii.), 

or, the coefficient of correlation, without regard to sign, should be equal to the 
correlation ratio. Further € should be zero, or 

PnPm-Pn'Pm=^ ....'.... (Ixix.). 

The theory of linear regression is so familiar that it need not be further discussed 
here. In the actual practice of statistics, the determination of the means of the 
x-arrays and the drawing of the regression line will often suffice to show the fairly 
trained eye whether the deviations from it are random or not. If they are not 
random, then we must proceed to the determination of ij and of the higher product- 
moments. 

The following are niunerical examples of skew correlation, selected to illustrate the 
theory developed above. 

* For example, it is necessary in most cases that C should vanish. In the instance of that very special 
case of linear regression, the Gauss-Laplacian normal frequency, it is easy to show that the constants €, ( 
both vanish as well 9Ar^^r\ 



SKEW GOBBELATION AND NON-LINEAE REGRESSION. 



31 



Statistical Illustbations. 

♦ 

(8.) Illustration A. — On the Skew Correlation between Number of Branches to the 
Whorl and Position of the Whorl on the Spray in the ca^e of Asperula odorata. 

In this case the material was collected in a lane near Horsham, Sussex, at 
Whitsuntide, 1903, by Miss M. Radford. There were 150 independent sprays, the 
woodruff had just flowered, and the whorls were counted fix)m the flower dovmivards. 
Being early in the season, the maximtun number of whorls was five, and, in some 
cases, not even as many were available. The material was counted and tabled by 
the author, and the results are exhibited in the table below : — 

Table I. — Correlation of Whorl-Branches and Position ot Whorl. 





X. 
«6 


Whorl. 


Number of branches in whorL 


Kp, 


y<fp' 


<^«p« 


77l2* 


Ws. 


4. 


6. 


6. 


7. 


8. 


Position of 
whorl. 


First . . 
Second . 
Third. . 
Fourth . 
Fifth . . 


1 
1 


3 

3 

6 

12 

13 


66 
61 
60 
68 
63 


42 
47 
40 
39 
10 


39 
39 
44 
22 
10 


150 
160 
150 
142 
87 


6-7800 
6-8133 
6-8133 
6-4859 
6-1724 


•8553 
•8437 
•9047 
•8780 
•8605 


•7316 
•7117 
•8185 
•7709 
•7404 


•1535 
•0985 
•0383 
•1347 
•4049 


Totals. . . . 


2 


37 


308 


178 


154 


679 


6-6554 




— 


— 



We require the regression curve giving the probable number of branches for a 
given whorL 

Dealing first with the skew variation in position, a purely arbitrary system 
depending solely on the niunber of whorls dealt with in each position, we find, not 
using Shbppard's correction,* 



Mean=2-802,651, 
o-,= 1-336,887, 



Hence we determine 



vj= 1-787,268, 
V8= -311,783, 
1/4=5-841,682. 



V5= 2-799,638, 
v,=22-678,308. 



fii= -017,027, 
i8j= 1-828,767, 
^8= -085,645, 



^8 = 
^4 = 



-811,740, 
'286,465. 
-610,879, 



^4=3-972,295, and v/A=+ 130,487. 

* The numbers are tabulated to six places, because we cannot be sure that the final calculations are for 
the data true to two places, which is all we finally retain unless this is done. Any number of figures can 
really be retained with perfect ease when the work is done on a calculator. 



32 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

We now turn to the skew variation in the number of branches to the whorl, and 
get the following constants : — 

Mean=6-655,375, /i2= -806,124, 
<ry= -897,842, ii^= '132,090, 

ft4=l-138,410. 

The values of y^^, Wj, and m^ are given in table above. Using them we find 

<rM=-224,377, i7=-249,911, <r^=<ryv/l-y = 869,355, 

X2=<rM*= -050,345, X^=-007,474, Xi=*990,862, X2="- 059,851. 

These give by (xxxiii.), showing the numerical contribution of each term, 
2,*=^{-878,991--010,323--000,888-007,231 + -013,578}, 

or the probable error of i; = '0242. 

Had we calculated the probable error of >; fix)m (xxxiv.), we should have found for 
its value *0243. It is clear that for this special case the simple formula (xxxiv.) is 
amply sufficient, the small terms almost cancelling. 

We see that Xi is almost unity, and the graph of a'nJ(Ty shows indeed that the system 
is sensibly homoscedastic. Xs ^ small, but a glance at the graph of the clitic curve 
on Diagram I. shows that we can hardly treat the system as homoclitic, the changes 
in the skewness forming a fairly uniform curve. ''^ 

For practical purposes, we may treat the variability of the number of branches in 

any array as sufficiently closely given by a-y s/l-^rjK 
We now turn to the product-momentsf and find 

Pji= — -249,160, l>8i = — -896,415, 
^21= — -236,289, p4i = — 1-210,225. 

* Throughout these illustrations the clitic curve is plotted by calculating the skewness of the arrays 
from ^HnhY'\ See p. 23. 

t In calculating these products referred to the centrdd from those referred to any axes, generally 
corresponding to whole numbers in the table, the following reduction formul» will be found useful 
We take NII^ » S (May ^'V^)» ^ ^^^ ^ being measured from any axes, further, £', y' are the distances of the 
means from these axes^ and vs, v^ v^ the mmnents of the ansharacter about its mean as tabled above. 

Pn - 1141 - 4«'n8i + 6a'«n2i + 4«'»nn + ig'^IIoi - y V4. 

The i^s should be further corrected for grouping by Sheppard's corrections (given on my p. 36), provided 
there be high contact at the contour of the surface of frequency. Sheppard's corrections have not in this 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 33 

These lead to 

r=- -207.579, e=- -120,1 64, C= -•038,241, ^= — -285,890. 

Thus all the constants are determined. 

We find 

,^«-r'= -019,367, 

(f,^ (,y«_r2)-e«=-001,281, 

,^^(^8_r»)-€«-(C^2-c,^8)V(<^,^4-<^3')=-000,276. 

These should be respectively zero for linear, parabolic, and cubical regressions. It 
will be seen that they are satisfied with increasing closeness ; we might well be 
satisfied even with the parabolic regression curve. The following are the regres- 
sion curves determined, y^r^ being the actual number of branches in the whorl 
( = 6*655,375-|-Y;p), and Xp the actual position of the whorl : — 

(a.) Straight line: 

y^^=7-046,087 — -139,408 OTj,. 

(6.) Parabola from (Ixv.) : 

y,^=6794,052--125,872rrp--077,592aj/; 
or, 

y,^=6-853,561--077,592(irp-l-991,535)^. 

This clearly gives a maximum nimiber of branches, 6*8536 corresponding to 
aj^= 1*9915, a value within the limits of observation, 
(c.) Cubic fi'om (lix.) : 

y^^=6*799,399 — -192,439 X^— -084,230 V+*020,915Xpl 

Here X^ is measured fi^om the mean position=a:^— 2*802,651, and y^^ is, as before, 
the total number of branches for the given position. 

Condition (Ivii.) is so closely satisfied that we shall here get sensibly as good 
results from (lix.) as from (Ivi.). 

In the table below and in the curves of Diagram I. the values of the mean of 
the arrays, as found from line, parabola, and cubic, are given and compared with 
observation. 

case been used, as this condition is not fulfilled. The axes x\ y actually taken for woodruff were those 
through the third whorl and through six branches. 

An obvious warning about the signs of the sums of the products may be given which may save 
computators some trouble. The axes being taken positive, as in the accompanying 
figure, then the sums of the products for Tin and Tin are positive in the 1'^ and 



Qrd 

S'*, negative in the 2"** and 4"* quadrants. For 1121 and 1141 they are positive 



4th 



\^ 



in the 1"* and 4*** quadrants and negative in the 2~* and 3"* quadrants. In 2«i 

the figure the axes are taken so as to suit the z and ^-directions of the table on 

p. 31. Care must, of course, be paid to this point. The products may also 

be found from the y^^'s in the manner indicated on p. 35, footnote. They were thus verified in this case. 

E 



34 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 



Table II. — Mean Branches to each WhorL 



=^= 


0. 


1. 


2. 


3. 


4. 


5. 


6. 


yxf from line .... 

Vx^ jy parabola . . . 

yxj. y, cubic .... 

Observed .... 




7-046' 
'6 • 546" 
■6-117" 




6-907 
6-777 
6-750 
6-780 


6-767 
6-854 
6-889 
6-813 


6-628 
6-775 
6-758 
6-813 


6-488 
6-541 
6-443 
6-486 


6-349 
6-151 
6 192 
6-172 


6-210 
5-607 
6 007 

1 



I think we may safely say that in the relationship of branches to position of the 
whorl in woodruff we have a case of homoscedastic correlation, which is effectively 
described by a parabolic regression curve. Thus, in a case of this kind, it is only 
needful, besides the moments up to the fourth of the x- character, to find the 
correlation coefficient r and the correlation ratio rj. 



(9.) Illustration B. — On the Correlation between Age and Head Height in Girls. 



The data for this are taken from my School Measurement series, and involve the 
auricular heights of 2272 girls between the ages of 3 and 22. There was considerable 
paucity of material at the extreme ends of the range, and accordingly as our correlation 
curves are all obtained by weighting the observations, we can hardly expect good fits 
near 3 or 22 years of age. The actual correlation table is given as Table III. 
Sheppard's corrections were applied throughout, and the unit of height is 2 millims. 

In the first place the means, standard deviations, and 3*^ moments of all the arrays 
of heights for different years of age were determined. These are given at the foot of 
Table III., but in actually calculating the constants more places of decimals were 
used. Then the first six moments of the frequency of the ages were found and the 
first four moments of the height frequencies. These are the x and y-frequencies. 
They give us : — 



# ^ 



1 



•4d 

to 

w 



■ f i fc 



millims. 
1Q2 -25-104 '25 

104 -25-106 25 

106 -25-108 -25 

108 -25-110 *25 

110 -25-112 -25 

112 -25-114 -25 

114-25-116-25 

116 -25-118 -25 

118 -25-120 -25 

120 -25-122 25 

122-25-124-25 

124 -25-126 -25 

126 -25-128 -25 

128 -25-130 -25 

130 -25-182 -26 

182 -25-134 -25 

184 -25-186 -25 

136 -25-138 -25 

138 -25-140 -25 

140 '25-142 -25 

142 -25-144 -25 

144 -25-146 '25 

146 -25-148 -25 



Totals 



Means 

in 

l.millim. units 



Standard deviation 

in 

2.inillim. units 



Third moments 

in 
2.millim. units 



} 



} 
} 



----- * '' -*- ■ 'i_ 



8-4. 



*-6o. 



2 
2 
2 



115 -2500 



116-9617 



2'S8te 







- 42-82^ 

i 

h 



20-21. 



2 
1 



1 
1 



123 -8214 



2 -5811 



- 2-729 



21-22. 



8 



1 
1 



8 



126-5000 



4-1414 



^ 85 '816 



22-28. 



125 '2500 



-9574 



Totals. 



2 

10 

10 

27 

56 

59 

115 

142 

244 

265 

261 

265 

219 

197 

181 

88 

77 

52 

20 

16 

11 

4 

1 



2272 



124 0467 



8-4541 



•I- 5-206 



To face page 34. 



{ 



{ 



{ 



millims. 
102 '25-104 '25 

104 -26-106 -25 

106 -25-108 -25 

108 -26-110 -25 

110 -25-112 -25 

112 25-114 -25 

114 25-116 '25 

116 -25-118 -25 

118 -25-120 -25 

120 -25-122 -25 

122 -25-124 '25 

124 -25-126 -25 

126 -25-128 '25 

128 -25-130 -25 

ISO -25-182 -25 

182 '25-184 '25 

184 '25-186 -25 

186 -25-188 -25 

188 25-140 -25 

140 -25-142 -25 

142 -25-144 -25 

144 -25-146 '25 

146 -25-148 -25 



Totals. 



Means 

in 

l.millim. units. 



Standard deviation 

in 

2.millim. units. 



Third moments 

in 
2«millim. units. 



H 
I 

i 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 



35 





Height ( 


C7o»wton<«. 




Age Constants. 


Mean 


height = 


124-0467 millims. 


Mean age = 127007 

o-,= 3 064,819 




<r,= 


3-454,125 ' 


• 


i/j= 9-393,110 




M8= 


11-930,977 
5-206,247 


m 
> 2 millim. 
units. 


vs=i 1-051,882 
Vi= 239-157,055 




M4= 


438-639,633 

J 




V6= 104-298,702 






• 


i/,=9536-265,059 




^1= 


-015,960, 


^1= -001,335, 




/8'„= 


3-081,454, 


fi.2= 2-710,593, 
)83= -014,093, 


Further 






^4= 11-506,681, 




2«= 


2-093,366 millims. 


Vfii-+ -036,538, 




X2= 


4-882,18l1 in 1 millim. 


«^8= 1-709,258, 




X,= 


62-399,135 f units. 


<^= -250,123. 


Hence 




•rf 




(X4~3V 


M4V)= 


•062,340, 




«^4= 4*158,032. 



in 
year 
unita 



In the next place the products were worked out and referred to the means with 
the following results : — * 



p^^= 3-113,712, 


whence r= -294,128, 


Pn-- 1-957,022, 


€=-•071,065, 


2>3i= 74-447,616, 


{=--048,576, 


«^. = -108-701,559, 


^=-•470,126. 



Further, from 2m, 7j= -303,024. 

In deducing the product-moments after they had been referred to the meanSy the 

* These products were in this case (as in all other cases) verified by calculating from the means of the 
arrays y^^y the expressions 

sSnxJ/x,{x,-&)\ sfn^yx,{Xp-AY\ ^Ifh^zri^'^YX s|^^i:,(^-^>l 

Of course it is easiest to calculate these products about some arbitrary origin coinciding with the 
abscissa of one array. If these products be then ^n, p'n, p'zi, p\if and *' be the mean, we have 

Pn =p\h 

Pti =p'2i - 2*yu, 

pn «/si - Z^y^i + 3a' Vn, 

Pn =p\i - UYn + 6a'yji - 4* Yii 

£ 2 



.# 



36 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

proper Sheppard's corrections were introduced. These are, if {pnU {Pzi)^ li^3i}> 
\ i^4i } represent the uncorrected moments : — 

Pn={Pn]^ P2i=^P2i]^ 

P3i= [Ps] 1 -i{Pn]> Pn= [Pu] -i{p^i]> 

the units of grouping being the units throughout. 
From the constants for the arrays, I found 

Xi — 1 = — '000,675, X2=— '007,198. 

Whence the probable error of r; was determined by (xxxiii.). Its value was^ 

Probable error of 17= -012,913. 

If found from the simple formula '67449 (1— >7^)/N, the value is '012,851. We 
accordingly are again forced to the conclusion that r) may for practical purposes be 
found from this simple formula, instead of the complicated result (xxxiii.). Although 
both Xi""l ^^d X2 ^^^ small, it is very doubtful whether we can legitimately consider 
the system as homoscedastic. The dotted line ah of Diagram II. would fairly well 
represent increasing variability with age. The skewness of the arrays is relatively 
small and changes sign so frequently, that we can certainly not attribute any law to 
such heteroclitic tendencies as there are. They are probably due to errors of random 
sampling from truly isocurtic material. 

It will be seen that the height frequencies with /8', = *0160 and ^'2==3'0815 do not 
differ very much from a normal distribution ; in fact, we can lay no stress on the 
heteroclisy of the system at all. But the values of the standard deviations of the 
arrays, or the giaph of Cftjcy, certainly shows increasing variation with increasing age, 
a phenomenon with which one is familiar in a variety of other human characters, t 

This heteroscedasticity, due to increasing variation with growth, would hardly have 
been anticipated from a mere inspection of the smallness of xi > '^^ ^^ somewhat 
obscured by the irregular values of the standard deviations of the small arrays at 
the adult end of the age range. The mean value of the standard deviation of the 

weighted arrays is cry \/l— 7j^ = 3*2992 in 2-millim. units. 

We now turn to the regression curves to see how far the conditions for the 

different types are satisfied. We have 

1,2-.^=: -005,312, 

<f>2 (i?^-r2)-c2= -004,030, 

* The contributions of the successive terms of (xxxiii.) are in fact given by 

V = i { -824,785 + -001,870 + -004,673 - -000,472 + -001,888 }. 
# 

t See Pearson : * The Chances of Death and other Studies of Evolution,' voL I., pp. 296, 307, 

310, 314. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 



37 



But the first should be zero, if the regression be lineax ; the second, if it be 
parabolic ; and the third, if it be cubical. 

We see increasing approximation to fulfilment of the several conditions. Referred 

to axes through the mean age and head height, the following are the regression 

curves* : — 

a.) Straight line : 

Y^,= -662,979 X;,. 

6.) Parahola (from equation (Ixv.)) : 

Y,,= -055,749 + -667,570X^--041,001 V. 

c.) Cubic (from equation (Ivi.)) : 

Y,,= -280,194-1- -722,886 X^- -029,580 X/- '002,223 X/. 

y.) Cubic (from equation (lix.)) : 

Y,^= -296,076-1- -812,249 X^--028,004Xp2--005,740X^^ 

[c') will not give as good results as (c), for it depends on a use of the condition 
(Ivii.) which is not absolutely ftilfilled. 

The following table gives the values in the case of the four curves : — 

Table IV. — yj.^=Mean Auricular Height of GirFs Head at Given Age. 



7^ = age. 


Regression line. 


Regression 
parabola.! 


Cubic (c). 


Cubic 


(c> 


Observed. 


3-5 


117-95 


114-49 


116-90 


118' 


94 


115-25 


4- 


5 


118- 


61 


115- 


87 


117- 


66 


118- 


94 


116- 


96 


5- 


5 


119- 


27 


117' 


17 


118- 


42 


119- 


16 


117' 


47 


6' 


5 


119- 


94 


118- 


39 


119- 


24 


119' 


57 


119' 


10 


7- 


5 


120" 


60 


119' 


52 


120- 


08 


120' 


14 


120' 


30 


8' 


5 


121' 


26 


120- 


•57 


120- 


93 


120' 


84 


121' 


63 


9- 


5 


121- 


92 


121 


55 


121' 


78 


121- 


62 


121' 


72 


10 


5 


122' 


59 


122' 


43 


122' 


62 


122' 


45 


# 122' 


82 


11 


'5 


123 


25 


123 


24 


123 


42 


123 


•26 


123 


14 


12 


5 


123 


'91 


123 


•97 


124 


•18 


124 


15 


123 


•89 


13 


'5 


124 


•58 


124 


•61 


124 


•88 


124' 


'95 


124 


•86 


14 


•5 


125 


•24 


125 


•17 


125 


•52 


125 


•65 


125 


•71 


15 


■5 


125 


•90 


125 


•65 


126 


•07 


126 


•22 


126 


•16 


16 


•5 


126 


■57 


126 


•05 


126 


•52 


126 


•68 


126 


•53 


17 


•5 


127 


•23 


126 


•36 


126 


•87 


126 


•93 


126 


•91 


18 


•5 


127 


•89 


126 


•59 


127 


•09 


126 


•96 


127 


•02 


19 


•5 


128 


•55 


126 


•75 


127 


•18 


126 


-74 


129 


•56 


20 


•5 


129 


•22 


126 


•81 


127 


•11 


126 


•22 


123 


•82 


21 


•5 


129 


•88 


126 


-80 


126 


•88 


125 


•38 


126 


•50 


22-6 


130-54 


126-71 


126-48 


124-28 

1 


125-25 



* Yx, is here measured in millimetres and Xp in years. 

t The maximum ordinate is at vertex of parabola, t.«., 2; = 8*1409, or age 20*84; its magnitude = 126*82. 



38 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

An examination of this table and the graphs on Diagram II. seem to show : — 

(i.) That cubic (c) is considerably better than cubic {(/). 

(ii.) That we do get a sensible betterment in passing from parabola to cubic, and, 
accordingly, that we must use in this the cubic to eflfectively describe the regression 
within the range of observation. Probably neither cubic nor parabola would effectively 
serve for extrapolation even close to the limits of observation. 

Thus the cubic (c') starting at 3-4 with its point of inflection is clearly 
inadmissible, and the drop after 20 or 21 years of age, shown by both parabola and 
cubic, is, of course, only due to the anomalous character of the few girls over 18 left 
in the schools. Actually the shrinkage of measurements does not begin till at least 
26 years, and is then far more gradual than these curves indicate. 

But, as in all fitting of this kind, we obtain the best fit we can within the range, 
entirely at the expense of what may occur just outside the range. For this reason, 
as E. Perrin* has pointed out, a good interpolation curve is usually a bad extra- 
polation curve. 

We might sum up our results for auricular height with age in girls by saying : 
That the correlation • is non-linear, effectively cubic ; heteroscedastic, there being 
increasing variability with growth ; that while the total height frequency is not very 
far from normal the array frequencies are slightly heteroclitic, but so very irregular in 
sign, that probably we are dealing with a case of isocurtic homoclisy, to which the 
sparsity of data in the extreme arrays gives an appearance of anomic heteroclisy. 

(10.) Illustration C. — On the Skew Correlation between Size of Cell and Size of Body 

in Daphnia magna. 

Dr. E. Warren has dealt with this point in a memoir published in ' Biometrika,* 
vol. II., pp. 255-9. The resulting regression curve of size of cell for given size of 
body is very far from linear, and it is quite clear that the correlation is skew. It 
has already been noted in * Biometrika ' that the relationship is considerably obscured 
by the irregularities produced by ecdysis. Our object at present, however, is purely 
theoretical, namely, to show how a certain system of constants and of curves describes 
the actual correlationship, and for this purpose Dr. Warren's observations form as 
good material for graduation as we could expect to find. The following Table V. 
gives the observations with the working scales attached. I must refer to 
Dr. Warren's paper (p. 256) for the relation between the units of grouping on the 
working scales and those of the actual measurements on body and cell lengths. As 
far as correcting the raw moments is concerned, Sheppard's corrections were used 
for the cell sizes, but not for the body lengths, because the number of individuals in 
the latter case was perfectly arbitrary and there is no approach to high contact. The 

* * Biometrika,' vol. III., p. 99. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 



39 



product moments were also uncorrected. The product moments were found in both 
ways (see p. 35, footnote) and the results thus verified. 

Table V. gives the means, standard deviations, and third moments of the arrays ; 
the latter are all small and superficially irregular in sign. I think we may say that 
there is no marked and continuous heteroclisy. On the other hand, I think we may 
say that while the clitic curve deviates to and fi'o fi'om a zero base, the scedastic 
curve would fit better to a parabolic curve than to the straight line which is its 
mean. In other words, the variability of the cells increases with size of body (i.e., 
growth) up to a certain stage and then decreases again. This result is obscured by 
the fall of the variability after each ecdysis. Roughly the ecdyses produce a rhythm 
in all three curves, the regression curve, the scedastic curve, and the clitic curve. 
When the means of the arrays are above the regression cubic, then the ordinates of 
the scedastic curve are above their mean and those of the clitic curve show positive 
skewness ; when they are below the regression curve, we have lessened variability 
and negative skewnesa In other words, the ecdyses are accompanied by lessened 
cell variability and negative skewness of distribution. I think we may state that 
there is a nomic heteroscedasticity due to growth of body, giving first an increased 
variability with growth and afterwards a decrease with age. There is probably 
isocurtic homoclisy. Both of these are, however, obscured by a semi-rhythmic 
heteroscedasticity and heteroclisy introduced by the ecdyses. 

We now turn to the constants of the cell and body length distributions, merely 
noting that all these constants are given in terms of the units of the working scales. 

Body Length Constants. 
Mean body length = 8-502,488, 



Further 



Cell Constants. 




Mean cell= 


9-268,657, 


<r,= 


2-541,734, 


/*«= 


6-460,410, 


Ms= 


2-142,362, 


/t4=] 


123-921,496, 


A'= 


-017,021, 


^/= 


2-969,111. 


2.= 


1-454,600, 


x*= 


2-115,862, 



X^s 15-142,840. 
Hence {K-^KM^^K*)- -095,615. 



<r,= 


3-864,784, 


Vi — 


14-936,562, 


H- 


- 5-125,806, 


Vi— 


432-769,533, 


"6 = 


— 425-276,682, 


"«= 


15192-5375, 


A= 


•007,885, 


^«= 


1-939,793, 


)8»= 


-043,796, 


^4= 


4-559,091, 


v/A= 


— -088,798, 


^8= 


•931,908, 


^= 


— -232,167, 


^4= 


-788,409. 



40 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 






.S 

m 

to 
a 



§ 

O 

O 









oooO'^ooocq(MO»-i»-ioc<i 
i I I++I I + I++I l + l 



GO 



4 






s 
s 



SCOOOCOipiOCD'-HCDOOOOCDCD 
COO)iOt<»eOO>CDCD<MkOOOeOi-H 

oiot<*ooo)aoQbooi-ioodO>oo 



o 



ooooooooooooooo 



s 



o 

s 



CD 



lO 



eo 



c4 



O) 



00 



o 



CO 



<M 



M I I I r*^ I I I I 



l« I I I 



CO lO '^ 



Mil 



eo 



eo eo 00 00 



I I 



eo 



\^ I |"2S2 I I"* I 



»^ C^ »-H I— I »-H 



"^CO .•-HCDCDOi'^OiOC*»CO 



OOtOOcOOO'-HOcOf^iOiO 



c-ii— i-H»-Heo<Mcqf— <Mcocq'^»-i 



OeOCDOOO>i-HOO'^CDt*^^C*»"^CD 

»-ii-H c<icq'^»-ic^.— ic^coc^eo 



iOiO'-HO>coeOf-HC4t>>eoioo>0)cD 



c<ieoeoeo»-icO'^oooo»-«oot*.-H«c<i 
« cq »-i •-• « CO i-H •-• i-H « 



cq 



+ 



m 



00 

CD 

cq 




I co-^eocooodOdeocDeoi-icqoo^i-H 

O CO »^ i-H »^ 



o> t<* lO ^» to CD cq 

CO »-• 



(M Oi cq eo cq cq CO 



-«- I'' I I 



r I I I I I 



r I I I I I 



eo »-i 



I I 



I I 



'^ CD I— I 



cq 



I I 



I I 



I I I 



I I I 



»-icqcO'^»ocDt*ooo>o»-«cqcO'4<»o 



'q()du9i iCpog 



CD 



00 



00 

cq 






OS 



cq 

CD 



to 
cq 



00 



o 

CO 



00 

cq 



cq 

CO 



CD 



CD 

CO 



cq 



cq 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 41 

We have next the product moments referred to the means 

Pii= 3*892,863, whence rss 394,862, 

Pti=- 12104,322, €=-'281,831, 

P8i= 127-348,064, {= -098,578, 

P4i=— 541-433,465, ^=-'759,344. 

Further, from 2m, 

i;s= -572,287. 

From the constants for the arrays I deduced 

Xi—1= — -108,148, X8=*088,323. 

These are higher values of Xi'^ ^^^ X2 than we have found in the first two 
illustrations. 
We now ohtain, showing the contribution of each term of (xxxiii), 

2,2=r^{-452,240--002,528+-010,803 — -013,180— -027,875}. 

Whence probable error of >;= '67449 2,= "0097. 

Had we calculated the probable error of ly from (xxxiv.), we should have found it 
equal to 0101. The difference is greater than in the two previous illustrations, but 
is only '0004, and this would have no significance in any practical use of the probable 
error. We again conclude, therefore, that (xxxiv.) is sufficiently close to replace 
(xxxiii.) in practice. 

For the mean standard deviation of the weighted arrays we have 



<r„=<r, yi-iy*=2-084,358. 

If we now examine the criteria for the nature of the regression, we have 

iy«-r»=-171,596, 

<l>i (i?*-r«)-?= -080,483, 

4>^ (^'-^)-^"'-(I<As-^<A8)V(<A2^4-<A8*)= 079,457. 

We should conclude, therefore, that linear regression is inadmissible, but that 
parabolic or cubic will be moderately success^, the latter not very much better than 
the former. Our moderate success only in this case is, of course, due to the irregu- 
larity of the results to be graduated, the influence of the ecdyses being so disturbing 
that we really need a curve periodically varying fi:om the graduated regression cxuve. 

We have the following regression curves : — 

(a.) Straight line: 

Y;^= -259,687 X^ 

F 



42 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 



(&.) Parabola from (Ixv.) : 

Y^,= 1*097,690+-236,135X^--073,490X/. 

The maximum occurs when X;,= 1*6066, and is given by Y^^=1'2874, thus occurring 
within the limits of observation.* 
(c.) Cubic from (lix.) : 

Y^,= 752,856 + -193,058X^- -049,817 X^^+ -001,710 X/. 

In all these cases Y^^^ and X^ are measured from the means of the cell and body 
lengths, or from 9-268,657 and 8-502,488 respectively. 

Table VI. gives the calculated and observed results, and the whole system is 
represented in Diagram III. Either the parabola or cubic graduates quite well the 
results, allowing for the periodic deviation, and we may fairly describe the system as 
a heteroscedastic cubic regression with isocurtic homoclisy. The correlation ratio is 
very sensibly different from the correlation coefficient. The regression cubic does not 
differ widely from that given in * Biometrika,' which was obtained without weighting 
the means of the arrays, and by simply striking the best cubic of the given type 
through the points. 

Table VI. — ^y,^=Mean Cell Length for Given Body Length in Daphnia. 



iCp = body length. 


Begression line. 


Regression parabola. 


Regression cubic. 


Observed. 


1 


7-320 


4-458 


5-047 


5-300 


2 


7 


■580 


5 


•724 


6 


190 


5 


833 


3 


7 


'840 


6 


842 


7' 


166 


7' 


790 


4 


8 


099 


7' 


813 


7' 


•986 


8 


•060 


5 


8 


359 


8 


638 


8' 


661 


9 


473 


6 


8 


619 


9 


315 


9 


200 


8' 


436 


7 


8' 


•879 


9 


•846 


9' 


613 


8' 


596 


8 


9 


•138 


10 


•229 


9 


912 


10 


267 


9 


9 


•398 


10 


466 


10' 


106 


10 


761 


10 


9 


•658 


10 


555 


10- 


205 


11 


027 


11 


9 


917 


10 


498 


10' 


220 


10 


963 


12 


10 


177 


10 


293 


10' 


161 


9 


100 


13 


10 


437 


9 


942 


10 


038 


9 


000 


14 


10 


•696 


9 


443 


9' 


861 


10 


036 


15 


10-956 


8-798 


9-642 


10-317 



(11.) Illustration D. — On the Skew Correlation between Number of Branches to the 
Whorl and Position of the Whorl on the Stem in Equisetum arvense, 

I have selected this example not on account of any biological importance, because 
the material is — especially with regard to the first and last two whorls — unsatisfactory 
either on account of irregularity or of insufficiency of material. It has been taken 



Actual values on working scales, x^^ 10*1091 and y«^» 105560. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 



48 



purely from its statistical interest, because it gives a series with markedly skew 
correlation, having a regression curve of a rough S-shaped character. If we omit 
the first and last whorls, we get, as I have already shown,* a remarkably close fit 
with a cubical regression curve. My present object, however, is not to consider any 
law of growth, but merely a mass of statistical material, to be dealt with by the 
processes of the present paper. 

We may anticipate that the irregularities of the series, indicated in the memoir 
just referred to, will make themselves manifest in a less satisfactory fitting of the 
regression ciu've than occurs when we deal with the more homogeneous group oi 
equally weighted whorls fitted in the diagram of that paper. Table VIL gives the 
data, with the means, standard deviations, and third moments of each array. 

The axis of x shall be taken to give the position of the whorl on the stem and that 
of y to denote the niunber of branches. We require the regression curve of y on x, 
or the probable number of branches on a whorl in a given position. We shall not 
use Sheppard's corrections for the moments of either the x or y-characters, as high 
contact certainly does not hold for both at the low- value ends of their ranges. 

We have the following constants : — 



Position Constants. 



Branch Constants. 



Mean position = 


6-403,315, 


Mean number of branches = 


7-216,851, 


<r,= 


3-542,604, 




«r,= 


3-278,499, 


»'8= 


12-550,046, 


■ 


/tj= 


10-748,557, 


"8= 


8-249,534, 




/*s=- 


- 24-313,478, 


*'*— 


319-515,824, 




/*4= 


245-811,660, 


»'6 = 


644-095,176, 








i'j=:11203-5814, 








A = 


-034,429, 




/9'i= 


•476,044, 


i82 = 


2-028.625, 




/S'*= 


2-127,658. 


i88= 


•214,190, 


Further 






fi.= 


5-667,884, 




2h= 


2-789,949, 


v/)8,= 


-185,550, 




X8= 


7-783,815, 


<l>i— 


•994,196, 


* 


K= 


140-441,685. 


^»= 


-592,384, 


Hence 






^4= 


1-518,136. 


(X,-3V)/(4V)= 


-•170,503. 


e have next the product moments referred to the means 








• • Proc. Roy 


. Soc.,' voL 71, p. 308. 
F 2 







44 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 



I 



-eg 

a 

o 

a 



IS 



S 



J5 

I 



I 



I 



o 



CO 



C9 






00 



eo 



CI 



'Tt«f-l^'^t*i-lOt*tD^eOC9r-H*00 
0)i-iOOOC(9lOO)C!|G<9<^C(9f^OOO 

IIIIIIIII+ + + + + + + 



€0 

eo 
I 



OCOt«f-HQOlOr-H^COO)^OtD^eO 
CDt«Q0^^t«Q00)l00)O«D-^t«e0 
eOC^^-tr-Hi-iCOt^C^^r-HiOOt^-^'^ 



C<l 



CqC4(Mi-iOOOOO 



00 
CO 



O'^t^OCOt^G^t^lO'^COCDC^eOO 
f-HO)(MeO'^C!|eOO)lO«D-^tDCDCOlO 
«DC^tDt«tD-^t«C^^O)'^Q0-^eOe<l 



t<*O)O)O)O)O)00t<*iOCOG9 



<DtDtD<D<D'^eOr-iO)Ot«t«0)e^'^C^ 
C4G<9C4e9G<9C4G<9C^r-ii-iO)CDeOf-H 



I I n I I I I I 



CO lo eo c^ CO r-H 



I i I 



i-i <o eo 00 O "^ t* "^ I I 

•-I r-H C!| C^ O) e^ i-l I I 






e9t«ioeooiOQO<^Of-H 

C^COCOCOCOcOC^C^i-^ 



O) r-H 0> O 



O CO CO eot« lO 

1-1 i-i C^ C9 1-1 



COlOOXDQ0<DC^rHO)^f-Hf-H 
1-1 ,-1 O^ ,-H l-H 



00 eo 



O^f-H 



2 I 



CO 



c^ 



CI 



CO to CO -^ -^ CO 



e<i e<i lo O) t« lO 



I "^ lO -^ i-< Oi C^ 



I r-H t« CO CO 00 CO 
I r-H rH f-H 



I I CO O O Oi n* -^ "^ i-i 



CI 



00 



eo 



CI 



CD 
CI 



CD 



CI 



CD 



lO 






I I CI 00 00 v-i ^ <^ 00 eo CI 

II i-< CO CI CI 



CI 
CI 



r-HC4COT|«lOCDt«000)Or-HC4COT|«^CD 



•poq^ JO noyjisoj 



I 



SKEW CORRELATION AND NON-UNBAR REGRESSION. 



45 



^ji= — 8-225,585, 

^21= - 21-471,321, 

J93i= -205 084,042, 

p^^= -917-984,938, 



whence r= —708,222, 

€= —-390,436, 
Z= +029,733, 
?= --960,212. 



Further, from 2 



M9 



17= -850,984. 



From the constants for the arrays we deduce 

Xi—1= — '356,367, X2=— •312,952, 
We now obtain, showing the contribution of each term of (xxxiii. ), 

2,«=^{-076,080--157,932+-055,359+-079,662+-038,579}. 

Whence probable error of 17= -67449 2,= '0054. 

Had we calculated the probable error oi rj from (xxxiv.) we should have foimd it 
equal to -0049. The difference "0005 is not of importance for practical purposes. 
Yet in this case it is clear that the values of Xi — 1 and x% are very sensible. Thus we 
see that a very marked heteroscedastic and heteroclitic system with continuously 
changing standard deviation and skewness scarcely affects for practical purposes 
{i.e., to three significant figures) the probable error of 17. All four of our illustrations 
therefore confirm the conclusion that : 

For practical purposes the probable error of the correlation ratio, rj, may be taken 
as -67449 (l-i7*)/N. 

Our Diagram IV. gives the values of the relative standard deviations of the arrays, 
or, a-njo-y, the horizontal line giving \/l —17^= -5252, or the mean value of the relative 
standard deviations of the weighted arrayB. We have also the clitic ciu've giving 

^\/fi\y for each array.* The remarkable smoothness of these scedastic and clitic curves 

in this case indicates how fiir certain tjrpes of correlation surfaces diverge from piu^ 

normality of distribution, the divergence being obviously nomic. 

We now turn to the regression curves and write down the conditions for the 

different types; the three expressions should be zero for linear, parabolic, and 

cubical regression respectively 

t7«-r«= -222,596, 

^g(iy«-r»)-?= -068,864, 

^2 (i?*-^)-^-(i^2--^)7(^A--<^')= -010,127. 

* ^ n/j^ =° difference between mode and mean divided by standard deviation = skewness in the case of 
skew-curves of Type III. (* Phil. Trans.,' A, vol. 186, p. 373), and may be taken as a reasonable measure of 
the skewness for those cases in which the fuller form involving ^ would involve too laborious calculations. 
If in equation (xii.) of the present memoir we put )83=-3+ a small quantity, and remember that Pi is itself 
a small quantity, we see that the more correct formula for the skewness involving /^s reduces, neglecting 
terms of 2~* order, to \ JP\. 



46 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 



We see at once that the straight line is inadmissible, the parabola will not be very 
good, and the cubic only moderately appropriate. The conditions are not nearly so 
closely fulfilled as in the cases of woodruff and head heights ; the last two are better 
than in the case of Daphnia cells, but while the deviations in the case of Daphnia 
were irregular, there being no approximate smoothness in the scedastic or clitic 
curves, we shall find here more uniform deviations which would probably be partially 
allowed for by a quartic regression cui've. 

The following are the regression curves : — 

(a.) Straight line : 

Y^^= - -655,423 X;,. 

(6.) Parabola from (Ixv.) : 

Y,^=1-551,307--574,171X^--123,610 V- 

The maximum ordinate is at the position X^=— 2*3225, or a;^=4'0808, with 
maximum number of branches yp= 9*435. 
(c.) Chihic from (Ivi.) : 

Y,,= 1 '590,413- -987,694 X^- '1 37,641 X/+ -016,605 X^^ 

In all cases X^ and Y,, are measured from the mean position and the mean number 
of branches, i.e., 6403,315 and 7*216,851 respectively. 

The following table contains the calculated and observed results : — 

Table VIIL — Mean Number of Branches to each Whorl in Equisetum. 



Position. 


Begression line. 


Regression 
parabola. 


Begression 
cubic. 


Observed. 


Begression cubic 
without first whorL 


1 


10-758 


8-262 


7-506 


7-619 


[8-2071 


2 


10 


•103 


8 


•900 


9 


■070 


9- 


294 


8' 


929 


3 


9 


■447 


9 


'291 


9' 


920 


9- 


627 


9 


869 


4 


8 


■792 


9 


434 


10 


156 


9' 


730 


10 


161 


5 


8 


137 


9 


•330 


9 


■876 


9- 


643 


9 


911 


6 


7' 


481 


8' 


■980 


9 


182 


9' 


427 


9 


■224 


7 


6 


■826 


8' 


■382 


8 


172 


8- 


732 


8 


■206 


8 


6 


170 


7' 


536 


6 


■947 


7- 


■297 


6 


962 


9 


6 


516 


6 


444 


6 


606 


6 


666 


6 


699 


10 


4' 


869 


6- 


104 


4' 


247 


3 


964 


4 


223 


11 


4' 


•204 


3 


617 


2- 


971 


2 


443 


2 


939 


12 


3 


649 


1 


683 


1 


879 




866 


1 


■864 


13 


2- 


893 


-0 


399 


1 


069 




462 


1 


072 


U 


2' 


238 


-2 


■727 


0' 


641 




333 





700 


16 


1 


682 


-6' 


303 


0' 


694 




260 





844 


16 


0-927 


-8- 126 


1-328 


1-000 


1-610 



In the last column I have placed the results of re- working the whole system, 
omitting the first whorl as largely influenced by the ground condition at the foot of 



SKEW COERELATION AND NON-LINEAR REGRESSION. 47 

the stem.* The improvement of fit is not suflSciently great to justify a publication of 
all the constants for the distribution in this modified case. But there is improvement 
for the higher whorls, which are so few in number as to be wholly insignificant when 
compared with the weight of the first few low whorls. 

It will be noticed at once that the line and the parabola (which gives at the top of 
the stem negative nimibers !) are absolutely unsuitable for representing the facts of 
the case. The cubic is better and certainly gives the general trend of the observa- 
tions, but in this our last illustration we have clearly reached the limit of material to 
which such cubical regression can be satisfactorily applied. See Diagram V. 

(12.) Quartic Regression. 

It seemed of some interest in this case of Equisetum to ascertain whether any real 
improvement in description would be reached by considering the quartic regression 
curve. I briefly indicate the theory in this case as developed from the general 
method in the footnote, p. 25. We shall now have 

Eliminating 6q and t^, by the processes familiar to us from the case of cubical 
regression, we have 

Y,>,=r (X,/<r,)+fe,{(X,/<r,)«- ^/J, ( V<'*)- 1 } 



Hence as before 



^=M8+^3^4+M« \ (IxxL), 

where i^j, ^3, and ^^ are given as before by (li. and liv.), while 

^i—Pi—Pz—Pi (Ixxii.), 

<f>Mfi6-M3-Mi)Iy/Wi (Ixxiii.), 

<l>MPA-fiz'-PAWi (ixxiv.). 

and 

fi5=^7^d/<r*^^ P6 = ^s/<rx^ (IXXV.). 

Solving, we have 

j^ _ ^(^^i^ri^t 1(^4^1^^^ (Ixxvi.), 



* < 



Roy. Soc. Proc.,' voL 71, pp. 308-310. 



48 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

and 

Substituting in (Ixx.), the solution is completed. The advantage of this form is that 
we see clearly the modifications made in h^ and 63 as we pass from cubical to quartic 
regression. On the other hand, <f>^ and ff>p as shown by (Ixxv.), involve the 7*** and 
8*** moments of the aj-character. These are not only very laborious to calculate, but, 
as we have already shown, are as a rule very imtrustworthy. 

If we proceed as on p. 26, equation (IviL), we find 

rj^-^T^^b^i+bJ^+bJf (IxxviiL). 

Using this and not the third equation of (Ixxi.), we replace (Ixxvi.) by 

&4 = (<^2^4-<^3')- 1 , "^^ hihh-j'^'')^ . (Ixxix.). 

This equation for 64 only involves the 7*** and not the 8*** moment, but like the 
corresponding form (Ix.) suffers from being a ratio of small quantities. (IxxviL) 
completes the solution as before. 

(Ixxvii.) and (Ixxix.) in conjunction give us a necessary condition for quartic 
regression. We can indeed now write the whole series of conditions as follows : — 

Linear regression : 

Parabolic regression : 
Cubical regression : 

Quartic regression : 

<^2 (<^2<^4 — ^/) (^2^4 — ^3^)( <^2<^4<^7 ^ ^7^3^ ^ ^4^6^ — ^2^6*^ + ^^gf^g^^e) 

(Ixxx.). 

We now have a third possibility : we can get rid of the fourth product moment ff 
from the value of 64 and write it : 



\ 



\ 



i 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 49 

While this value of b^ does not suffer like (Ixxix.) from being the ratio of small 
quantities, and would a priori appear to save the calculation of $, yet the right sign of 
the root may not be ovious on inspection, so that an actual determination of ^ to find 
the sign of 64 may after all be needftiL If (Ixxx.) were absolutely satisfied, (Ixxxi.), 
(ixxix. ) and (Ixxvi. ) would lead to identical results ; but this will rarely be true in 
practice. In any of the three cases 63 and 63 will be given by (Ixxviii.). On the 
whole, I consider that (Ixxxi.) and (IxxvL) will give the better residts, and probably 
the former the best, but it will generally require as much arithmetic as the latter. 

(13). Illustration E. — CalciUcUion of the Quariic Regression Curve in the Case 

of Equisetum arvense. 

The only new constants required are : 

i^7=43,207-386, whence ^85= 1-144,882, 

^8=507,649-540, ^8^= 20-463,633, 

and : 

(^6=3-425,069, (^a=3*452,046, 

4^7=15-015.792. 
These lead us to : 

^4^j"^<Ao = 2-723,384, ^j-^j^g = 1-211,194, 

9«94"-98 9294— 9s 

^♦=! ^2» ^8. ^5 =1-745,622. 

<^8> <^4> ^« 

Our successive conditions are therefore : 

^2_^-- -222,596, 

,«- j^—^/^^— -069,266, 

^*-^-?/<A2-(i<^2-^3)V{^2(<A2^4-^8')}=-010,186, 

whence we see the successive approximations to the fulfilment of .the conditions. 
Clearly great gains arise when we pass firom linear to parabolic, and fix)m parabolic 
to cubic regression, but the advance is not so conspicuous when we pass to quartic 
regression. 

G 



50 



PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 



We have :- 



From (Ixxvi.) 
From (Ixxix.) 
From (Lxxxi.) 



64=044,517, and 62=— -648,122, 63='171,260, 
64=151,842, and 62= — -940,410, 68=041,981, 
64=025,999, and 62= — -597,691, 68=-193,688. 



The equations to the three corresponding quartics are : 

(a). ¥^=1-724,611- -913.208 X^-169,311 V+ '^12,629 V+-000,927 V. 
(6). ¥^=2047,717- -734,966 X^- 245,667 V+'003,096 V+ '003,161 X,* 
(c). ¥^=1-668,788- -944,192 X,--156,137 Xf^-\-'0U,283 V+ '000,541 X/. 

The values of ¥^ and X^ are as before measured from the means, or 7*216,851 and 
6*403,315 respectively. 

The values of the observed and calculated ordinates are given in Table IX., and 
the graph of the results in the lower half of Diagram V. 



Table IX. 



-Mean Number of Branches to Whorl in Equisetum deduced from Quartic 

Regression. 



Position. 


Quartic (a). 


QuartiB(6). 


Quartic (e). 


Observed. 


1 


7-731 


8-269 


7-637 


7.619 


2 


8- 


960 


8' 


662 


9 


000 


9 


294 


3 


9 


716 


9 


'222 


9 


800 


9 


627 


4 


10 


014 


9 


674 


10 


073 


9 


780 


6 


9 


868 


9 


816 


9 


866 


9 


643 


6 


9' 


281 


9 


621 


9 


240 


9' 


427 


7 


8' 


339 


8' 


740 


8' 


270 


8- 


792 


8 


7' 


109 


7' 


'498 


7' 


042 


1- 


297 


9 


5 


692 


6 


'898 


6 


666 


6- 


666 


10 


4' 


209 


4' 


116 


4 


226 


3 


964 


11 


2' 


816 


2 


407 


2' 


'876 


2' 


443 


12 


1 


661 


1 


100 


1 


746 




866 


13 


0' 


930 





600 





'987 




462 


14 





■867 


1" 


S8» 





766 




333 


15 


1 


666 


4' 


022 


1 


269 




260 


16 


3-609 


9 133 


2-6«7 


1000 



From these results we deduce the following conclusions : — 

(i.) That the use of a quartic instead of a cubic regression curve has not very 
markedly bettered the fit. The failure to get a closer fit lies largely in the natitf e of 
the material. The nimiber of plants with more than 13 whorls is very few, and their 
contribution allows little weight to the tail of the regression curve. Further, all our 



SKEW COBI«a.ATION AND NON-LINEAB BEGBESSION. 51 

• 

attempts to fit a smooth r^ression curve show that the observed data are unduly 
flattened at the top. If we confine ourselves to a homogeneous series of 11 plants 
with ten whorls apiece, we get a remarkably good fit.* The S-shape of the 
regression line as indicated in both cubic and quartic does, however, appear to be 
characteristic of the nature of the plant, and I take it that more ample material 
would allow of a closer analytical description by a simple cubic. I doubt whether for 
practical statistics the use of the quartic will often be requisite. 

(ii.) The comparative failure of the quartic (6) shows us that a formula like (Ixxlx.) 
is of small service. This corresponds ftilly to our experience in the use of (Ix.) in the 
case of the cubic. In both cases we get rid of a high moment by making a certain 
constant the ratio of two small quantities, and experience shows us that the result is 
unsatisfactory. It is accordingly preferable to use formulae involving high moments 
of one variable in preference to those with a ratio of small quantities. 

(iii.) The quartic (c) appears as good, if not slightly better, than quartic (a). In 

(c) we have got rid of a high product moment, ^, by supposing the quartic condition 
(Ixxx.) rigidly fulfilled. This of course is not the case. It is clear that product 
moments like 9 of the 5*** order are far fi^om advantageous, and this is the same principle 
which was in evidence when we foimd (Ixv.) giving better results than (Ixiv.) for 
parabolic regression. Hence we must fiirther conclude that the use of third, fourth or 
fifth product moments is disadvantageous as compared respectively with fifth to eighth 
moments of one variable. Or, a moment two degrees higher is preferable to a product 
moment in calculating correlation valuea This is, I think, consonant with our 
knowledge of the relative magnitude of the probable errors in the two cases. 

(14.) General Conclusions. 

(i.) The present paper provides us with a general method of dealing with the 
regression line and the variability of arrays in the case of skew correlation, without 
any assumption as to the anal3rtical form of the skew correlation surface. 

(ii.) It provides a nomenclature and classification of the types of array variability 
which may be of service. 

Arrays are either homoclitic or hetetocliticy according as their skewnesses are of 
equal magnitude or not. Arrays are ftirther h^nioscedastic or heteroscedasticy 
according as their standard deviations are alike or different. Skew arrays ax'e termed 
allocurtic ; if arrays are symmetrical about their mean, they are isocurtic. 

A heteroclitic system of arrays may be nomic or anomicy according as the skewness 
of the arrays changes continuously or irregularly with the position of the array. 

A heteroscedastic system of arrays is also either nomic or anomtc, according as the 
standard deviation of the arrays changes continuously or irregularly with the 

* 'Boy. Soc Proc.,' voL 71, p. 308. 

O 2 



) 



52 PROFESSOR K. PEARSON ON THE GENERAL THEORY OF 

position of the arrays. Anomic heteroclisy and anomic heteroscedasticity probably only 
signify that our material is either heterogeneous or too sparse to free us from the 
large errors of random sampling in the extreme arrays. Still the terms will be 
found of use in describing the actual data. 

The curve in which the skewness of the array is plotted to its position is termed 
the clitic curve ; the curve in which the ratio of the standard deviation of the array 
to the standard deviation of the character in the population at large is plotted to 
position is termed a sceddstic cur re. 

(iiL) The types of regression have been classified into linear^ pafxiholic, cubic and 
qtiartic. For most practical purposes the first three suffice. Necessary criteria 
have been given for each case. But as in the case of the skew firequency of one 
character, an indefinite number of conditions ought theoretically to be fiilfiUed. 
Practically in dealing with fi-equency, no criteria are absolutely fulfilled, and the 
probable errors of the expressions used become unmanageable as we ascend in the 
scale. We must therefore be content to estimate the degree of approximation with 
which one or two necessary criteria are satisfied. 

The fundamental test of deviation fi*om the familiar form of linear regression is the 
inequality of the correlation coefficient r and the newly introduced correlation 
ratio 7f. The probable error of this latter is determined. It is shown that 
cTy \/l — 17^ is the mean standard deviation of a system of arrays in skew correlation. 
The ease with which 7) can be calculated suggests that in many cases it should 
accompany, if not replace the determination of the correlation coefficient. 

In the determination of the constants of the regression curve we must use 
moments and product moments. The limitations to the order of the curve used 
depend : (a) on the labour of the arithmetic, (6) on the increasing probable errors of 
the higher moments and product moments. For these reasons it seems idle to propose 
going beyond the 6"* to 8^** moments, or the 3"^ to 5*^ product-moments. Practical 
experience suggests that little is to be gained by using moments beyond the 6***, or 
product moments beyond the S'*. A quartic regression curve may be useful 
occasionally, but it has yet to justify its necessity. As our object is not to repro- 
duce the given data, but to provide a graduation for them, which smooths down the 
errors of random sampling, we believe that any legitimate and practical theory must 
discard the high moments and high product moments with which Thiele and LiPPS 
propose to deal. 

(iv.) There is one point to which reference ought to be made. Some reader may 
enquire why the method of my paper on curving fitting* should not be applied 
to these regression curves in general^ as we have in practice once or twice 
already applied it. It would seem that that method is the easier, involving in the 
case of the quartic only quantities analogous to our r, e, { and 6. The answer is 

* "On the Systematic Fittings of Curves to Observations a d Measurements." * Biometrika,' 
vol. I., pp. 265-303, and vol. XL, pp. 1-23, especially the latter, pp. 11-16. 



SKEW CORRELATION AND NON-LINEAR REGRESSION. 53 

straightforward : that process supposes every y^, to have equal weight, or Ujc, to be 
the same for each array. Hence the higher moments of the a:- character, which are 
really involved, can be written down without calculation once and for all.* The 
complexity of our present investigation arises from the introduction of the weighting 
into the calculation of the moments of the a:-character, as well as into that of the 
product moments r, e, {, 6. Our results therefore, although they might not look so 
good on a graph of the regression curve, would be markedly better, if due weight 
were given to the frequency of each array. The difference of the two conceptions is 
comparable to the determination of the regression on the one hand from the 
correlation coefficient, and on the other from merely striking a line through the 
plotted means of the arrays. The method of moments in the present case, if we 
except the use of >;, is identical with that of fitting a curve to a continuum in space 
by the method of least squares. 

(v.) No stress whatever is laid on the actual instances here selected for illustration of 
the methods of this paper. I have merely chosen out of available material cases in 
which I had come across skew regression of various types. Thus we find :— 

(a.) The correlation of the number of branches and position of the whorl in 
Aspei*ula odorata is practically parabolic, homoscedastic and of nomic heteroclisy. 

{b.) The correlation between auricular height of head and age in girls is cubical, 
of nomic heteroscedasticity and of anomic heteroclisy. It is probably really a case 
of isocurtosis. 

(c.) The correlation of size of cell and size of body in Da/phnia magna^ allowing 
for the irregularities produced by the ecdyses, is parabolic or cubic, of nomic 
heteroscedasticity, and probably, but for the above-mentioned irregularities, of 
isocurtic homoclisy. 

(d.) The correlation of the number of branches and position of the whorl in 
Equisetum arvense is cubical or possibly even quartic, of markedly nomic hetero- 
scedasticity and markedly nomic heteroclisy. 

It is not impossible that slips have occurred in the lengthy arithmetic involved, but 
every important piece of work has been done independently twice, once by Dr. Alice 
Lee, whom I have most heartily to thank for her unwearying assistance, and once 
by myself. To preserve uniformity of working, the constants have in each case 
been carried to six figures. This involves little or no additional trouble, using as we 
do mechanical calculators. The final results are of course of no value beyond their 
probable errors, which will be in the second or third place of figures. No doubt I 
shall be told that there is a show of accuracy in the number of decimal figures 
retained, which does not really exist. It does not exist (and I am as fully conscious 
of its non-existance as any would-be critic) so far as our results fit the actual 
population, of which we have but a random sample. The figures, however, are of 
importance, as far as testing accuracy of fit of result to a^t^ial sample goes. The 

* * Biometrika,' vol. II., p. 12. 



54 ON SKEW CQRBELATION AND NON-UNEAB BEGBESSION. 

cubic or quartic curves may have coeffici^itB iimensiUe before the third or fourth 
figure of decimals, and these coefi^cients have to be multiplied occasionally by 
abscissee of the third or fourth powers of 7 to 9. Hence to get ordinates true, 05 
far as the sample goes, to the second or third figure, we require to work to a fairly 
high niunber of figures. There is no magic in six figures, four or five would probably 
satisfy another worker, but they are easily read off the calculator we use, and if the 
constants had been tabled only to four or five, no reader would have been able to 
agree exactly, if he wished to test any of our results, even to three figures, with the 
final ordinates. 



1 



DIAGRAM I. SKEW CORRELATION IN A8PERULA OOORATA. 



8CEOA8TIC CU 



I2» 



190 



187 



I90 



125 



IS4 



133 



122 



121 



O 

W I20 



3 
S 

5 
< 



119 - 



112 



117 



MA 



.7 NUMBER OP Bl 



RCQRC8SI0N Q 



RCQRE88iON U 



RtQRCt8ION Pi 



CLITIO CURVE 




AGE OF QIRL 



DIAGRAM III. SKEW CORRELATION BETWEEN SIZES OF CELL AND BODY IN DAPHNIA. 




REGRESSION CUBIC 



REGRESSION PARABOLA 



-•8 



-•7 



K) g 
lu 

z 

w 
O « 

O 

< 

t»o 2 



b^ 



< 
o 



-s 



-4 



_*S 



SIZE OF BODY 

DIAGRAM IV. SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL IN EQUISETUM: 

SCEDASTIC AND CLITIC CURVES 



K>- 


































•9" 


































•8" 


1 
\ 














/ 

y 
y 


\ 

\ 
\ 
\ 
\ 
















•7" 


\ 
\ 

\ 

\ 












/ 
/ 

/ 

/ 


f 


\ 


\ 

\ 

\ 
















\ 
\ 
\ 










i 


/ 
/ 






\ 
\ 
\ 


ME 


AN 




\ 






\ 










* 








\ 






/ 


V 






^^ .8 - 

tt. 
O 

w 
J 


\ 

\ 
\ 
\ 








y 


/ 
/ 

/ 

y 








> 


\ 
\ 

\ 

V 




/ 

/ 
/ 

/ 


\ 
\ 
\ 

\ 




"SO 






■»^^ 


•CEDAt 


TIO OURV 


y 
E 












\ 

\ 
\ 
\ 


— - 


/ 
/ 

f 


\ 

\ 
\ 




2^ 


.<• 




















y 


y 




N 


\ 


. 


~rO 


•» — 


"■" 


— — — "~ 






CLITIC 
















N 


BASE 




-0 


•1 . 


OURVE 




-I 1^^ 


































^H^ 




• ! 


1 1 


i 


. 


1 


6 


( 


B 1 


1 1 


O i 


1 11 


\ II 


1 k 


1 U 


\ u 


1 



I 

z 

III 

'St 



o 



< 
o 



SIZE OF BODY 



DIAGRAM V. SKEW CORRELATION BETWEEN BRANCHES AND POSITION OF WHORL IN EQUI8ETUM : 

REGRESSION CURVES. 



w 

X 

o 

z 
< 
oc 
a 

o 

oc 
u 

a 
s 

3 
Z 



II 
lO ■ 




^ 






















- 










^ 


>< 


Y ^ 


>- 


^ 






























^"^ 








^"^ 






















o • 


) 


^ 
































J^ 


^^^ 




V 








^ > 




















.yiZ 














^ 


\ 




















</ 














\ 


\ 


















■ ^ 


/ 










^ 






sN\ 


















( 

7 . 


f 








X 




^ 


\ 


































V 


K^\ 


































^s,^ 


V \ 


































^«^ 


V ^ 
















II - 


















^ 


n\ 


\ 




































\^ 














H - 






















VOsN 


'v 




































V 












4 • 






















\ 


k \ 


\, 


































* \ \ 


^v 


































\\ \ 


X. 


































\\ 


^ 

L 


^^^ 
































\\ 


V 


^N^ 
































\\ 


\ 


^s,^ 








t_ 
























\ \ 


\ 


^s»^ 
































■ ■■ ■^■- 


r— T 




^ 




■ 


























\ 


\\ 




^x. 






























> 


'~\ 




\ 


V 




2. 


























-^x 






\ 






























^ 


r>t«» 




N 
































\ \. ■**» 




































\ \ "*» 






*<v 


1 _ 




1 
























\ N^->-._ 


bir^ 


■^ 




— 
























\ 









REGRESSION CUBIO 



REGRESSION LINE 



8 



10 



II 



12 



14 



IS 



16 




QUARTIO (hi 



REGRESSION PARABOLA 



QUARTIO (a) 



QUARTIO (C) 



POSITION OF WHORL 



^ 



